Archive for andrew

Processing Data Streams with Amazon Kinesis and MongoDB Atlas

This post provides an introduction to Amazon Kinesis: its architecture, what it provides, and how it’s typically used. It goes on to step through how to implement an application where data is ingested by Amazon Kinesis before being processed and then stored in MongoDB Atlas.

This is part of a series of posts which examine how to use MongoDB Atlas with a number of complementary technologies and frameworks.

Introduction to Amazon Kinesis

The role of Amazon Kinesis is to get large volumes of streaming data into AWS where it can then be processed, analyzed, and moved between AWS services. The service is designed to ingest and store terabytes of data every hour, from multiple sources. Kinesis provides high availability, including synchronous replication within an AWS region. It also transparently handles scalability, adding and removing resources as needed.

Once the data is inside AWS, it can be processed or analyzed immediately, as well as being stored using other AWS services (such as S3) for later use. By storing the data in MongoDB, it can be used both to drive real-time, operational decisions as well as for deeper analysis.

As the number, variety, and velocity of data sources grow, new architectures and technologies are needed. Technologies like Amazon Kinesis and Apache Kafka are focused on ingesting the massive flow of data from multiple fire hoses and then routing it to the systems that need it – optionally filtering, aggregating, and analyzing en-route.

AWS Kinesis Architecture

Figure 1: AWS Kinesis Architecture

Typical data sources include:

  • IoT assets and devices(e.g., sensor readings)
  • On-line purchases from an ecommerce store
  • Log files
  • Video game activity
  • Social media posts
  • Financial market data feeds

Rather than leave this data to fester in text files, Kinesis can ingest the data, allowing it to be processed to find patterns, detect exceptions, drive operational actions, and provide aggregations to be displayed through dashboards.

There are actually 3 services which make up Amazon Kinesis:

  • Amazon Kinesis Firehose is the simplest way to load massive volumes of streaming data into AWS. The capacity of your Firehose is adjusted automatically to keep pace with the stream throughput. It can optionally compress and encrypt the data before it’s stored.
  • Amazon Kinesis Streams are similar to the Firehose service but give you more control, allowing for:
    • Multi-stage processing
    • Custom stream partitioning rules
    • Reliable storage of the stream data until it has been processed.
  • Amazon Kinesis Analytics is the simplest way to process the data once it has been ingested by either Kinesis Firehose or Streams. The user provides SQL queries which are then applied to analyze the data; the results can then be displayed, stored, or sent to another Kinesis stream for further processing.

This post focuses on Amazon Kinesis Streams, in particular, implementing a consumer that ingests the data, enriches it, and then stores it in MongoDB.

Accessing Kinesis Streams – the Libraries

There are multiple ways to read (consume) and write (produce) data with Kinesis Streams:

  • Amazon Kinesis Streams API
  • Amazon Kinesis Producer Library (KPL)
    • Easy to use and highly configurable Java library that helps you put data into an Amazon Kinesis stream. Amazon Kinesis Producer Library (KPL) presents a simple, asynchronous, high throughput, and reliable interface.
  • Amazon Kinesis Agent
    • The agent continuously monitors a set of files and sends new entries to your Stream or Firehose.
  • Amazon Kinesis Client Library (KCL)
    • A Java library that helps you easily build Amazon Kinesis Applications for reading and processing data from an Amazon Kinesis stream. KCL handles issues such as adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, providing fault-tolerance, and processing data.
  • Amazon Kinesis Client Library MultiLangDemon
    • The MultiLangDemon is used as a proxy by non-Java applications to use the Kinesis Client Library.
  • Amazon Kinesis Connector Library
    • A library that helps you easily integrate Amazon Kinesis with other AWS services and third-party tools.
  • Amazon Kinesis Storm Spout
    • A library that helps you easily integrate Amazon Kinesis Streams with Apache Storm.

The example application in this post use the Kinesis Agent and the Kinesis Client Library MultiLangDemon (with Node.js).

Role of MongoDB Atlas

MongoDB is a distributed database delivering a flexible schema for rapid application development, rich queries, idiomatic drivers, and built in redundancy and scale-out. This makes it the go-to database for anyone looking to build modern applications.

MongoDB Atlas is a hosted database service for MongoDB. It provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of regions and billing options

Like Amazon Kinesis, MongoDB Atlas is a natural fit for users looking to simplify their development and operations work, letting them focus on what makes their application unique rather than commodity (albeit essential) plumbing. Also like Kinesis, you only pay for MongoDB Atlas when you’re using it with no upfront costs and no charges after you terminate your cluster.

Example Application

The rest of this post focuses on building a system to process log data. There are 2 sources of log data:

  1. A simple client that acts as a Kinesis Streams producer, generating sensor readings and writing them to a stream
  2. Amazon Kinesis Agent monitoring a SYSLOG file and sending each log event to a stream

In both cases, the data is consumed from the stream using the same consumer, which adds some metadata to each entry and then stores it in MongoDB Atlas.

Create Kinesis IAM Policy in AWS

From the IAM section of the AWS console use the wizard to create a new policy. The policy should grant permission to perform specific actions on a particular stream (in this case “ClusterDBStream”) and the results should look similar to this:

Next, create a new user and associate it with the new policy. Important: Take a note of the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Create MongoDB Atlas Cluster

Register with MongoDB Atlas and use the simple GUI to select the instance size, region, and features you need (Figure 2).

create mongodb atlas cluster

Create a user with read and write privileges for just the database that will be used for your application, as shown in Figure 3.

Creating an Application user in MongoDB Atlas

Figure 3: Creating an Application user in MongoDB Atlas

You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab (Figure 4). Note that if multiple application servers will be accessing MongoDB Atlas then an IP address range can be specified in CIDR format (IP Address/number of significant bits).

Add App Server IP Address(es) to MongoDB Atlas

Figure 4: Add App Server IP Address(es) to MongoDB Atlas

If your application server(s) are running in AWS, then an alternative to IP Whitelisting is to configure a VPC (Virtual Private Cloud) Peering relationship between your MongoDB Atlas group and the VPC containing your AWS resources. This removes the requirement to add and remove IP addresses as AWS reschedules functions, and is especially useful when using highly dynamic services such as AWS Lambda.

Click the “Connect” button and make a note of the URI that should be used when connecting to the database (note that you will substitute the user name and password with ones that you’ve just created).

App Part 1 – Kinesis/Atlas Consumer

The code and configuration files in Parts 1 & 2 are based on the sample consumer and producer included with the client library for Node.js (MultiLangDaemon).

Install the Node.js client library:

git clone https://github.com/awslabs/amazon-kinesis-client-nodejs.git
cd amazon-kinesis-client-nodejs
npm install

Install the MongoDB Node.js Driver:

npm install --save mongodb

Move to the consumer sample folder:

cd samples/basic_sample/consumer/

Create a configuration file (“logging_consumer.properties”), taking care to set the correct stream and application names and AWS region:

The code for working with MongoDB can be abstracted to a helper file (“db.js”):

Create the application Node.js file (“logging_consumer_app.js”), making sure to replace the database user and host details in “mongodbConnectString” with your own:

Note that this code adds some metadata to the received object before writing it to MongoDB. At this point, it is also possible to filter objects based on any of their fields.

Note also that this Node.js code logs a lot of information to the “application log” file (including the database password!); this is for debugging and would be removed from a real application.

The simplest way to have the application use the user credentials (noted when creating the user in AWS IAM) is to export them from the shell where the application will be launched:

export AWS_ACCESS_KEY_ID=????????????????????
export AWS_SECRET_ACCESS_KEY=????????????????????????????????????????

Finally, launch the consumer application:

../../../bin/kcl-bootstrap --java /usr/bin/java -e -p ./logging_consumer.properties

Check the “application.log” file for any errors.

App Part 2 – Kinesis Producer

As for the consumer, export the credentials for the user created in AWS IAM:

cd amazon-kinesis-client-nodejs/samples/basic_sample/producer

export AWS_ACCESS_KEY_ID=????????????????????
export AWS_SECRET_ACCESS_KEY=????????????????????????????????????????

Create the configuration file (“config.js”) and ensure that the correct AWS region and stream are specified:

Create the producer code (“logging_producer.js”):

The producer is launched from “logging_producer_app.js”:

Run the producer:

node logging_producer_app.js

Check the consumer and producer “application.log” files for errors.

At this point, data should have been written to MongoDB Atlas. Using the connection string provided after clicking the “Connect” button in MongoDB Atlas, connect to the database and confirm that the documents have been added:

App Part 3 – Capturing Live Logs Using Amazon Kinesis Agent

Using the same consumer, the next step is to stream real log data. Fortunately, this doesn’t require any additional code as the Kinesis Agent can be used to monitor files and add every new entry to a Kinesis Stream (or Firehose).

Install the Kinesis Agent:

sudo yum install –y aws-kinesis-agent

and edit the configuration file to use the correct AWS region, user credentials, and stream in “/etc/aws-kinesis/agent.json”:

“/var/log/messages” is a SYSLOG file and so a “dataProcessingOptions” field is included in the configuration to automatically convert each log into a JSON document before writing it to the Kinesis Stream.

The agent will not run as root and so the permissions for “/var/log/messages” need to be made more permissive:

sudo chmod og+r /var/log/messages

The agent can now be started:

sudo service aws-kinesis-agent start

Monitor the agent’s log file to see what it’s doing:

sudo tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log

If there aren’t enough logs being generated on the machine then extra ones can be injected manually for testing:

logger -i This is a test log

This will create a log with the “program” field set to your username (in this case, “ec2-user”). Check that the logs get added to MongoDB Atlas:

Checking the Data with MongoDB Compass

To visually navigate through the MongoDB schema and data, download and install MongoDB Compass. Use your MongoDB Atlas credentials to connect Compass to your MongoDB database (the hostname should refer to the primary node in your replica set or a “mongos” process if your MongoDB cluster is sharded).

Navigate through the structure of the data in the “clusterdb” database (Figure 5) and view the JSON documents.

Explore Schema Using MongoDB Compass

Figure 5: Explore Schema Using MongoDB Compass

Clicking on a value builds a query and then clicking “Apply” filters the results (Figure 6).

View Filtered Documents in MongoDB Compass

Figure 6: View Filtered Documents in MongoDB Compass

Add Document Validation Rules

One of MongoDB’s primary attractions for developers is that it gives them the ability to start application development without first needing to define a formal schema. Operations teams appreciate the fact that they don’t need to perform a time-consuming schema upgrade operation every time the developers need to store a different attribute.

This is well suited to the application built in this post as logs from different sources are likely to include different attributes. There are however some attributes that we always expect to be there (e.g., the metadata that the application is adding). For applications reading the documents from this collection to be able to rely on those fields being present, the documents should be validated before they are written to the database. Prior to MongoDB 3.2, those checks had to be implemented in the application but they can now be performed by the database itself.

Executing a single command from the “mongo” shell adds the document checks:

The above command adds multiple checks:

  • The “program” field exists and contains a string
  • There’s a sub-document called “metadata” containing at least 2 fields:
  • “mongoLabel” which must be a string
  • “timeAdded” which must be a date

Test that the rules are correctly applied when attempting to write to the database:

Cleaning Up (IMPORTANT!)

Remember that you will continue to be charged for the services even when you’re no longer actively using them. If you no longer need to use the services then clean up:

  • From the MongoDB Atlas GUI, select your Cluster, click on the ellipses and select “Terminate”.
  • From the AWS management console select the Kinesis service, then Kinesis Streams, and then delete your stream.
  • From the AWS management console select the DynamoDB service, then tables, and then delete your table.

Using MongoDB Atlas with Other Frameworks and Services

We have detailed walkthroughs for using MongoDB Atlas with several programming languages and frameworks, as well as generic instructions that can be used with others. They can all be found in Using MongoDB Atlas From Your Favorite Language or Framework.





Building Microservices with MongoDB, Docker, Kubernetes & Kafka

Building Microservices with Docker, Kubernetes, Kafka & MongoDB

Building Microservices with Docker, Kubernetes, Kafka & MongoDB

As part of MongoDB Europe on 15th November, I’ll be presenting on Microservices and some of the key technologies that enable them. Tickets are still available and the discount code andrewmorgan20 saves you 20% – register here.

Session Abstract

Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver.

Want to try out MongoDB on your laptop? Execute a single command and you have a lightweight, self-contained sandbox; another command removes all trace when you’re done. Need an identical copy of your application stack in multiple environments? Build your own container image and then your entire development, test, operations, and support teams can launch an identical clone environment.

Containers are revolutionising the entire software lifecycle: from the earliest technical experiments and proofs of concept through development, test, deployment, and support. Orchestration tools manage how multiple containers are created, upgraded and made highly available. Orchestration also controls how containers are connected to build sophisticated applications from multiple, microservice containers.

This session introduces you to technologies such as Docker, Kubernetes & Kafka which are driving the microservices revolution. Learn about containers and orchestration – and most importantly how to exploit them for stateful services such as MongoDB.





The rise of microservices – containers and orchestration

Earlier this week, I presented on microservices at MongoDB’s Big Data event in Frankfurt. You can view the slides here.


Abstract

Organisations are building their applications around microservice architectures because of the flexibility, speed of delivery, and maintainability they deliver. In this session, the concepts behind containers and orchestration will be explained and how to use them with MongoDB.





Webinar Replay: Data Streaming with Apache Kafka & MongoDB

I recently co-presented a webinar with David Tucker from Confluent.

The replay is now available: Data Streaming with Apache Kafka & MongoDB.

Abstract

A new generation of technologies is needed to consume and exploit today’s real time, fast moving data sources. Apache Kafka, originally developed at LinkedIn, has emerged as one of these key new technologies.

This webinar explores the use-cases and architecture for Kafka, and how it integrates with MongoDB to build sophisticated data-driven applications that exploit new sources of data.

Watch the webinar to learn:

  • What MongoDB is and where it’s used
  • What data streaming is and where it fits into modern data architectures
  • How Kafka works, what it delivers, and where it’s used
  • How to operationalize the Data Lake with MongoDB & Kafka
    How MongoDB integrates with Kafka – both as a producer and a consumer of event – data

Slides





Using MongoDB Atlas From Your Favorite Language or Framework

Developers love working with MongoDB. One reason is the flexible data model, another is that there’s an idiomatic driver for just about every programming language and someone’s probably already built a framework on top of MongoDB that takes care of a lot of the grunt work. With high availability and scaling built in, they can also be confident that MongoDB will continue to meet their needs as their business grows.

MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need (Figure 1).

Create MongoDB Atlas Cluster

Figure 1: Create MongoDB Atlas Cluster

MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

This post provides instructions on how to use MongoDB Atlas directly from your application or how to configure your favorite framework to use it. It goes on to provide links to some worked examples for specific frameworks.

Worked Examples for Specific Frameworks

Detailed walkthroughs are available for specific programming languages and frameworks:

This list will be extended as new blog posts are produced. If your preferred language or framework isn’t listed above then read on as the following, generic instructions cover most other cases.

Preparing MongoDB Atlas For Your Application

Launch your MongoDB cluster using MongoDB Atlas and then (optionally) create a user with read and write privileges for just the database that will be used for your application, as shown in Figure 2.

Creating an Application user in MongoDB Atlas

Figure 2: Creating an Application user in MongoDB Atlas

You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab (Figure 3). Note that if multiple application servers will be accessing MongoDB Atlas then an IP address range can be specified in CIDR format (IP Address/number of significant bits).

Add App Server IP Address(es) to MongoDB Atlas

Figure 3: Add App Server IP Address(es) to MongoDB Atlas

Connecting Your Application (Framework) to MongoDB Atlas

The exact way that you specify how to connect to MongoDB Atlas will vary depending on your programming language and (optionally) the framework you’re using. However it’s pretty universal that you’ll need to provide a connection string/URI. The core of this URI can be retrieved by clicking on the CONNECT button for your cluster in the MongoDB Atlas GUI, selecting the MongoDB Drivers tab and then copying the string (Figure 4).

Copy MongoDB Atlas Connection String/URI

Figure 4: Copy MongoDB Atlas Connection String/URI

Note that this URI contains the administrator username for your MongoDB Atlas group and will connect to the admin database – you’ll probably want to change that.

Your final URI should look something like this:

mongodb://appuser:my_password@cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/appdatabase?ssl=true&authSource=admin'

The URI contains these components:

  • appuser is the name of the user you created in the MongoDB Atlas UI.
  • my_password is the password you chose when creating the user in MongoDB Atlas.
  • cluster0-shard-00-00-qfovx.mongodb.net, cluster0-shard-00-01-qfovx.mongodb.net, & cluster0-shard-00-02-qfovx.mongodb.net are the hostnames of the instances in your MongoDB Atlas replica set (click on the “CONNECT” button in the MongoDB Atlas UI if you don’t have these).
  • 27017 is the standard MongoDB port number.
  • appdatabase is the name of the database (schema) that your application or framework will use. Note that for some frameworks, this should be omitted and the database name configured separately – check the default configuration file or documentation for your framework to see if it’s possible to provide the database name outside of the URI.
  • To enforce security, MongoDB Atlas mandates that the ssl option is used.
  • admin is the database that’s being used to store the credentials for appuser.

Check Your Application Data

At this point, you should add some test data through your application and then confirm that it’s being correctly stored in MongoDB Atlas.

MongoDB Compass is the GUI for MongoDB, allowing you to visually explore your data and interact with your data with full CRUD functionality. The same credentials can be used to connect Compass to your MongoDB database (Figure 5).

Connect MongoDB Compass to MongoDB Atlas

Figure 5: Connect MongoDB Compass to MongoDB Atlas

Once connected, explore the data added to your collections (Figure 6).

Explore MongoDB Atlas Data Using MongoDB Compass

Figure 6: Explore MongoDB Atlas Data Using MongoDB Compass

It is also possible to add, delete, and modify documents (Figure 7).

Modify a Document in MongoDB Compass

Figure 7: Modify a Document in MongoDB Compass

You can verify that the document has really been updated from the MongoDB shell:

Cluster0-shard-0:PRIMARY> use appdatabase
Cluster0-shard-0:PRIMARY> db.simples.find({
    first_name: "Stephanie", 
    last_name: "Green"}).pretty()
{
    "_id" : ObjectId("57a206be0e8ecb0d5b5549f9"),
    "first_name" : "Stephanie",
    "last_name" : "Green",
    "email" : "sgreen1b@tiny.cc",
    "gender" : "Female",
    "ip_address" : "129.173.45.61",
    "children" : [
        {
            "first_name" : "Eugene",
            "birthday" : "8/25/1985"
        },
        {
            "first_name" : "Nicole",
            "birthday" : "12/29/1963",
            "favoriteColor" : "Yellow"
        }
    ]
}

Migrating Your Data to MongoDB Atlas

This post has assumed that you’re building a new application but what if you already have one, with data stored in a MongoDB cluster that you’re managing yourself? Fortunately, the process to migrate your data to MongoDB Atlas (and back out again if desired) is straightforward and is described in Migrating Data to MongoDB Atlas.

We offer a MongoDB Atlas Migration service to help you properly configure MongoDB Atlas and develop a migration plan. This is especially helpful if you need to minimize downtime for your application, if you have a complex sharded deployment, or if you want to revise your deployment architecture as part of the migration. Contact us to learn more about the MongoDB Atlas Migration service.

Next Steps

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.





Configuring KeystoneJS to Use MongoDB Atlas

KeystoneJS is an open source framework for building web applications and Content Management Systems. It’s built on top of MongoDB, Express, and Node.js – key components of the ubiquitous MEAN stack.

This post explains why MongoDB Atlas is an ideal choice for KeystoneJS and then goes on to show how to configure KeystoneJS to use it.

Why are KeystoneJS and MongoDB Atlas a Good Match

The MEAN stack is extremely popular and well supported and it’s the go to platform when developing modern applications. For its part, MongoDB brings flexible schemas, rich queries, an idiomatic Node.js driver, and simple to use high availability and scaling.

MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

Like KeystoneJS, MongoDB Atlas is a natural fit for users looking to simplify their development and operations work, letting them focus on what makes their application unique rather than commodity (albeit essential) plumbing.

Installing KeystoneJS and Configuring it to Use MongoDB Atlas

Before starting with KeystoneJS, you should launch your MongoDB cluster using MongoDB Atlas and then (optionally) create a user with read and write privileges for just the database that will be used for this project, as shown in Figure 1. You must also add the IP address of your application server to the IP Whitelist in the MongoDB Atlas security tab.

Creating KeystoneJS user in MongoDB Atlas

Figure 1: Creating KeystoneJS user in MongoDB Atlas

If it isn’t already installed on your system, download and install Node.js:


You should then add the bin sub-folder to your .bash_profile file and then install KeystoneJS:

Before starting KeystoneJS you need to configure it with details on how to connect to your specific MongoDB Atlas cluster. This is done by updating the MONGO_URI value within the .env file:

The URI contains these components:

  • keystonejs_user is the name of the user you created in the MongoDB Atlas UI
  • my_password is the password you chose when creating the user in MongoDB Atlas
  • cluster0-shard-00-00-qfovx.mongodb.net, cluster0-shard-00-01-qfovx.mongodb.net, & cluster0-shard-00-02-qfovx.mongodb.net are the hostnames of the instances in your MongoDB Atlas replica set (click on the “CONNECT” button in the MongoDB Atlas UI if you don’t have these)
  • 27017 is the standard MongoDB port number
  • clusterdb is the name of the database (schema) that KeystoneJS will use (note that this must match the project name used when installing KeystoneJS as well as the database you granted the user access to)
  • To enforce security, MongoDB Atlas mandates that the ssl option is used
  • admin is the database that’s being used to store the credentials for keystonejs_user

Clients connect to KeystoneJS through port 3000 and so you must open that port in your firewall.

You can then start KeystoneJS:

$ node keystone

Testing the Configuration

Browse to the application at http://address-of-app-server:3000 as shown in Figure 2.

KeystoneJS Running on MongoDB Atlas

Figure 2: KeystoneJS Running on MongoDB Atlas

Sign in using the credentials shown and then confirm that you can upload some images to a gallery and create a new page as shown in Figure 3.

Create a Page in KeystoneJS with Data Stored in MongoDB Atlas

Figure 3: Create a Page in KeystoneJS with Data Stored in MongoDB Atlas

After saving the page, confirm that you can browse to the newly created post (Figure 4).

View KeystoneJS Post with Data Read from MongoDB Atlas

Figure 4: View KeystoneJS Post with Data Read from MongoDB Atlas

Optionally, confirm that, MongoDB Atlas really is being used by KeystoneJS, you can connect using the MongoDB shell:

To visually navigate through the schema and data created by KeystoneJS, download and install MongoDB Compass. The same credentials can be used to connect Compass to your MongoDB database – Figure 5.

Connect MongoDB Compass to MongoDB Atlas Database

Figure 5: Connect MongoDB Compass to MongoDB Atlas Database

Navigate through the structure of the data in the clusterdb database (Figure 6) and view the JSON documents (Figure 7).

Explore KeystoneJS Schema Using MongoDB Compass

Figure 6: Explore KeystoneJS Schema Using MongoDB Compass

View Documents Stored by KeystoneJS Using MongoDB Atlas

Figure 7: View Documents Stored by KeystoneJS Using MongoDB Atlas

Next Steps

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.





Migrating Data to MongoDB Atlas

MongoDB Atlas was announced at this year’s MongoDB World. It’s great not just for new applications, but also your existing MongoDB databases running on other platforms. This post will focus on how you migrate your data and applications over to MongoDB Atlas.

What is MongoDB Atlas?

MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

But what if you already have application data held in your own on-prem or cloud-based MongoDB database – is it possible to safely migrate that data to MongoDB Atlas? What if your data is held in a 3rd party hosted MongoDB service such as Compose or mLab? Conversely, is it possible to build your application against MongoDB Atlas and then move the data to a MongoDB database running on another platform in the future?

The answer to all of those questions is “yes”. In the future you should expect this to be a highly automated process but right now it involves some manual steps – the purpose of this blog post is to describe the process.

Moving Your Application Data to MongoDB Atlas

The procedure is very straightforward, but if you can’t tolerate losing any of your updates then it does involve stopping application writes for a period. That means it’s vital that you prepare in advance in order to minimize the impact.

Pre-Migration Checklist

  • How long will writes need to be stopped? Perform a dry-run of the mongodump & mongorestore steps but without stopping application writes to answer this.
  • When will the stopping of writes have the smallest impact?
  • What can you change in the application to minimize the impact, e.g. provide a read-only version of the service when it isn’t possible to write to the database?
  • Will you warn users of planned maintenance ahead of time?
  • Do you have sufficient storage space to store the dumped data on the machine where you plan to run mongodump?
  • Once the data has been migrated to MongoDB Atlas, the application will need to switch its database connections to the new address; identify how this will be done.
  • List the IP Addresses of all the machines that will need to connect to MongoDB Atlas – this includes your application nodes as well as the machine where mongorestore will be run. These will need to be added to your MongoDB Atlas group’s whitelist.
  • Decide on what MongoDB Atlas instance size to use and, if necessary how many shards will be needed.
  • Decide on which region to use, e.g. co locating the MongoDB Atlas instances with your cloud-based application servers.

Execute the Migration

  • Create the MongoDB Atlas cluster.
  • Add the required IP Addresses to the whitelist in your group’s security tab.
  • Stop database writes to your existing database; either in your application logic or by blocking them for each of your databases (schemas) in the original MongoDB deployment:
laptop> mongo --host=ec2-52-208-185-213.eu-west-1.compute.amazonaws.com \
    --eval "db.fsyncLock()"
  • Back up the data from the existing database (writes the data to a directory named dump):
laptop> mongodump --host=ec2-52-208-185-213.eu-west-1.compute.amazonaws.com \
    --port=27017
  • Write the data to MongoDB Atlas (using the connection information provided in the Web UI):
mongorestore --ssl --host cluster0-shard-00-00-qfovx.mongodb.net \
    --port 27017 -u billy -p XXX dump
  • Switch the application’s database connections over to your MongoDB Atlas instance.

Want more help? We offer a MongoDB Atlas Migration service to help you properly configure MongoDB Atlas and develop a migration plan. This is especially helpful if you need to minimize downtime for your application, if you have a complex sharded deployment, or if you want to revise your deployment architecture as part of the migration. Contact us to learn more about the MongoDB Atlas Migration service.

Moving Your Application Data Out of MongoDB Atlas

To migrate data out, you can download a MongoDB Atlas backup and then copy the contents to the receiving MongoDB cluster; the documentation describes how to load the data into the receiving replica set. The backup can be either a periodic snapshot or a point-in-time view of the MongoDB Atlas database. If you can’t tolerate lost writes, they must be stopped by the application (fsyncLock is not available in MongoDB Atlas).

Getting the Best Out of MongoDB Atlas

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.





MongoDB & Data Streaming – Implementing a MongoDB Kafka Consumer

Data Streaming

In today’s data landscape, no single system can provide all of the required perspectives to deliver real insight. Deriving the full meaning from data requires mixing huge volumes of information from many sources.

At the same time, we’re impatient to get answers instantly; if the time to insight exceeds 10s of milliseconds then the value is lost – applications such as high frequency trading, fraud detection, and recommendation engines can’t afford to wait. This often means analyzing the inflow of data before it even makes it to the database of record. Add in zero tolerance for data loss and the challenge gets even more daunting.

Kafka and data streams are focused on ingesting the massive flow of data from multiple fire-hoses and then routing it to the systems that need it – filtering, aggregating, and analyzing en-route.

This blog introduces Apache Kafka and then illustrates how to use MongoDB as a source (producer) and destination (consumer) for the streamed data. A more complete study of this topic can be found in the Data Streaming with Kafka & MongoDB white paper.

Apache Kafka

Kafka provides a flexible, scalable, and reliable method to communicate streams of event data from one or more producers to one or more consumers. Examples of events include:

  • A periodic sensor reading such as the current temperature
  • A user adding an item to the shopping cart in an online store
  • A Tweet being sent with a specific hashtag

Streams of Kafka events are organized into topics. A producer chooses a topic to send a given event to, and consumers select which topics they pull events from. For example, a financial application could pull NYSE stock trades from one topic, and company financial announcements from another in order to look for trading opportunities.

In Kafka, topics are further divided into partitions to support scale out. Each Kafka node (broker) is responsible for receiving, storing, and passing on all of the events from one or more partitions for a given topic. In this way, the processing and storage for a topic can be linearly scaled across many brokers. Similarly, an application may scale out by using many consumers for a given topic, with each pulling events from a discrete set of partitions.

Kafka Producers, Consumers, Topics, and Partitions

Figure 1: Kafka Producers, Consumers, Topics, and Partitions

MongoDB As A Kafka Consumer – A Java Example

In order to use MongoDB as a Kafka consumer, the received events must be converted into BSON documents before they are stored in the database. In this example, the events are strings representing JSON documents. The strings are converted to Java objects so that they are easy for Java developers to work with; those objects are then transformed into BSON documents.

Complete source code, Maven configuration, and test data can be found further down, but here are some of the highlights; starting with the main loop for receiving and processing event messages from the Kafka topic:

The Fish class includes helper methods to hide how the objects are converted into BSON documents:

In a real application more would be done with the received messages – they could be combined with reference data read from MongoDB, acted on and then passed along the pipeline by publishing to additional topics. In this example, the final step is to confirm from the mongo shell that the data has been added to the database:

Full Java Code for MongoDB Kafka Consumer

Business Object – Fish.java

Kafka Consumer for MongoDB – MongoDBSimpleConsumer.java

Note that this example consumer is written using the Kafka Simple Consumer API – there is also a Kafka High Level Consumer API which hides much of the complexity – including managing the offsets. The Simple API provides more control to the application but at the cost of writing extra code.

Maven Dependencies – pom.xml

Test Data – Fish.json

A sample of the test data injected into Kafka is shown below:

For simple testing, this data can be injected into the clusterdb-topic1 topic using the kafka-console-producer.sh command.

Next Steps

To learn much more about data streaming and how MongoDB fits in (including Apache Kafka and competing and complementary technologies) read the Data Streaming with Kafka & MongoDB white paper and watch the webinar replay.





Using PencilBlue with MongoDB Atlas

PencilBlue is a Node.js based, open source blogging and Content Management System, targeted at enterprise grade websites.

This post explains why MongoDB Atlas is an ideal choice for PencilBlue and then goes on to show how to configure PencilBlue to use it.

Why MongoDB Atlas is the Ideal Database for PencilBlue

MongoDB delivers flexible schemas, rich queries, an idiomatic Node.js driver, and simple to use high availability and scaling. This makes it the go-to database for anyone looking to build applications on Node.js.

MongoDB Atlas provides all of the features of MongoDB, without the operational heavy lifting required for any new application. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on what you do best.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

Like PencilBlue, MongoDB Atlas is a natural fit for users looking to simplify their development and operations work, letting them focus on what makes their application unique rather than commodity (albeit essential) plumbing.

Installing PencilBlue and Connecting it to MongoDB Atlas

Before starting with PencilBlue, you should launch your MongoDB cluster using MongoDB Atlas and then (optionally) create a user with read and write privileges for just the database that will be used for this project, as shown in Figure 1.

Adding a PencilBlue User to MongoDB Atlas

Figure 1: Adding a PencilBlue User to MongoDB Atlas

You must also add your IP address to the IP Whitelist in the MongoDB Atlas security tab (Figure 2).

Add IP Address to MongoDB Atlas Whitelist

Figure 2: Add IP Address to MongoDB Atlas Whitelist

If it isn’t already installed on your system, download and install Node.js:

$ curl https://nodejs.org/dist/v4.4.7/node-v4.4.7-linux-x64.tar.xz -o node.tar.xz
$ tar xf node.tar.xz

You should then add the bin sub-folder to your .bash_profile before installing the PencilBlue command line interface (CLI):

$ sudo npm install -g pencilblue-cli
Password:
npm WARN engine pencilblue-cli@0.3.1: wanted: {"node":">= 4.4.7"} (current: {"node":"0.12.5","npm":"2.11.2"})
/usr/local/bin/pencilblue -> /usr/local/lib/node_modules/pencilblue-cli/lib/pencilblue-cli.js
/usr/local/bin/pbctrl -> /usr/local/lib/node_modules/pencilblue-cli/lib/pencilblue-cli.js
pencilblue-cli@0.3.1 /usr/local/lib/node_modules/pencilblue-cli
├── process@0.11.8
├── colors@1.1.2
├── q@1.4.1
├── shelljs@0.7.3 (interpret@1.0.1, rechoir@0.6.2, glob@7.0.5)
└── prompt@1.0.0 (revalidator@0.1.8, pkginfo@0.4.0, read@1.0.7, winston@2.1.1, utile@0.3.0)

The CLI can then be used to install and configure PencilBlue itself:

$ pbctrl install PencilBlue
Site Name:  (My PencilBlue Site) PokeSite
Site Root:  (http://localhost:8080) 
Address to bind to:  (0.0.0.0) 
Site Port:  (8080) 
MongoDB URL:  (mongodb://127.0.0.1:27017/) mongodb://pencilblue_user:my_password@cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/?ssl=true&authSource=admin
Database Name:  (pencilblue) clusterdb
Do you want to install Bower components?:  (y/N) 
Cloning PencilBlue from github...
Cloning into 'PencilBlue'...
Installing npm modules...
...
Creating config.js...
Installation completed.

Note that if you need to change the configuration (e.g., to specify a new URL to connect to MongoDB) then edit the config.js file that was created during this step.

The MongoDB URL you provided contains these components:

  • pencilblue_user is the name of the user you created in the MongoDB Atlas UI
  • my_password is the password you chose when creating the user in MongoDB Atlas
  • cluster0-shard-00-00-qfovx.mongodb.net, cluster0-shard-00-01-qfovx.mongodb.net, & cluster0-shard-00-02-qfovx.mongodb.net are the hostnames of the instances in your MongoDB Atlas replica set (click on the “CONNECT” button in the MongoDB Atlas UI if you don’t have these – Figure 3)
  • 27017 is the standard MongoDB port number
  • To enforce security, MongoDB Atlas mandates that the ssl option is used
  • admin is the database that’s being used to store the credentials for pencilblue_user
Find the Hostnames From the MongoDB Atlas UI

Figure 3: Find the Hostnames From the MongoDB Atlas UI

clusterdb is the name of the database (schema) that PencilBlue will use (note that unlike some frameworks, the database name is specified separately rather than being embedded in the MongoDB URL).

The PencilBlue process can now be started:

$ cd PencilBlue/
$ pbctrl start

Confirm that MongoDB Atlas is Being Used

At this point, it is possible to connect to MongoDB Atlas using the MongoDB shell (we’ll look at an easier way to navigate the data later) to confirm that the schema has been created:

$ mongo mongodb://cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/admin?replicaSet=Cluster0-shard-0 --ssl --username billy --password my_password

Cluster0-shard-0:PRIMARY> show dbs
admin      0.000GB
clusterdb  0.008GB
local      0.007GB

Cluster0-shard-0:PRIMARY> use clusterdb
switched to db clusterdb

Cluster0-shard-0:PRIMARY> show collections
article
auth_token
comment
custom_object
custom_object_type
fs.chunks
fs.files
job_log
lock
media
page
password_reset
plugin
plugin_settings
section
server_registry
session
setting
theme_settings
topic
unverified_user
user

Create Your First Page in PencilBlue

Browse to the application at http://localhost:8080 as shown in Figure 4 and create a user account.

Register User in PencilBlue

Figure 4: Register User in PencilBlue

You’re then able to login and create your first page (Figure 5).

Create a New Page Using PencilBlue

Figure 5: Create a New Page Using PencilBlue

After saving, the new page can be viewed (Figure 6).

View Pokémon Page in PencilBlue

Figure 6: View Pokémon Page in PencilBlue

To visually navigate through the PencilBlue schema and data, download and install MongoDB Compass. Use your MongoDB Atlas credentials to connect Compass to your MongoDB database – Figure 7.

Connect MongoDB Compass to MongoDB Atlas

Figure 7: Connect MongoDB Compass to MongoDB Atlas

Navigate through the structure of the data in the clusterdb database (Figure 8); view the JSON documents (Figure 9) and check the indexes (Figure 10).

Explore PencilBlue Schema Using MongoDB Compass

Figure 8: Explore PencilBlue Schema Using MongoDB Compass

View PencilBlue Documents in MongoDB Compass

Figure 9: View PencilBlue Documents in MongoDB Compass

View PencilBlue Indexes Using MongoDB Compass

Figure 10: View PencilBlue Indexes Using MongoDB Compass

Next Steps

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.





Develop & Deploy a Node.js App to AWS Elastic Beanstalk & MongoDB Atlas

Introduction

This blog post demonstrates how to build and deploy an application on AWS Elastic Beanstalk, and have that application connect to MongoDB Atlas as its back-end database service:

  • Introducing the example MongoPop application
  • Connecting applications to your MongoDB Atlas cluster; including IP address whitelisting
  • Downloading and testing MongoPop locally and on AWS Elastic Beanstalk
  • Populating your database with thousands of realistic documents
  • Explaining key parts of the application code
  • Adapting and redeploying applications
  • Graphically exploring your schema and data with MongoDB Compass

AWS Elastic Beanstalk is a service offered by Amazon to make it simple for developers to deploy and manage their cloud-based applications. After you’ve uploaded your application, Elastic Beanstalk automatically takes care of:

  • Capacity provisioning, adding more instances as needed
  • Load balancing
  • Health monitoring

MongoDB Atlas provides all of the features of the MongoDB database, without the operational heavy lifting. MongoDB Atlas is available on demand through a pay-as-you-go model and billed on an hourly basis, letting you focus on your application code.

It’s easy to get started – use a simple GUI to select the instance size, region, and features you need. MongoDB Atlas provides:

  • Security features to protect access to your data
  • Built in replication for always-on availability, tolerating complete data center failure
  • Backups and point in time recovery to protect against data corruption
  • Fine-grained monitoring to let you know when to scale. Additional instances can be provisioned with the push of a button
  • Automated patching and one-click upgrades for new major versions of the database, enabling you to take advantage of the latest and greatest MongoDB features
  • A choice of cloud providers, regions, and billing options

There is clearly a lot of synergy between these technologies – both of them handling the enabling infrastructure, letting the developer spend their precious time on writing great applications. To continue in the spirit of developer productivity, the application used in this post is developed using Node.js, the Express web application framework, and the Pug (formerly Jade) template engine.

The Application – MongoPop

Let’s start by taking a look at what the new Mongopop application provides.

Getting your MongoDB Atlas cluster up and running is a breeze but what do you do with it next? Wouldn’t it be great to populate it with some realistic data so that you can start experimenting? This is what MongoPop does – even letting you tailor the format and contents of the data using the Mockaroo service.

Mockaroo is a flexible service, allowing you to define a rich schema and then generate realistic sample data sets. Supported types include:

  • Email address
  • City
  • European first name
  • JSON array
  • Branded drug names
  • Custom types defined by you, based on regular expressions

Data files can be downloaded from Mockaroo in multiple formats, including: JSON, CSV, and SQL.

MongoPop pulls data from Mockaroo and then automatically writes the data to your database. It defaults to our example Mockaroo schema but you can replace that with a URL for any schema that you’ve defined in Mockaroo (or any other service providing arrays of JSON documents). Mockaroo takes care of connecting to MongoDB Atlas and runs multithreaded, speeding up the process of loading large datasets into MongoDB.

Using MongoPop

Identify IP Address of MongoPop Server for MongoDB Atlas IP Whitelisting

Figure 1: Identify IP Address of MongoPop Server for MongoDB Atlas IP Whitelisting

When you first access MongoPop (Figure 1), you’re presented with a form to provide details on how to connect to your MongoDB Atlas instance, and what you’d like the data to look like. Before completing the form, take a note of the IP address that’s displayed. This IP address needs to be added to the whitelist for your group, which is done through the security tab of the MongoDB Atlas UI (Figure 2).

Add MongoPop IP Address to MongoDB Atlas Group Whitelist

Figure 2: Add MongoPop IP Address to MongoDB Atlas Group Whitelist

In a production Elastic Beanstalk environment, the IP whitelisting can be a little more involved – that will be covered later in this post.

Find the Node.js Driver Connect String in MongoDB Atlas

Figure 3: Find the Node.js Driver Connect String in MongoDB Atlas

While in the MongoDB Atlas UI, click the “CONNECT” button, select the “MongoDB Drivers” tab and then the “COPY” button (Figure 3). Paste the copied URI directly into MongoPop. You should also enter the password and the database you want to use.

Note that the URI needs editing before it’s actually used but MongoPop handles that using the password and database name you provide; the final URI will take this form: mongodb://mongodb_user:my_password@cluster0-shard-00-00-qfovx.mongodb.net:27017,cluster0-shard-00-01-qfovx.mongodb.net:27017,cluster0-shard-00-02-qfovx.mongodb.net:27017/mongopop?ssl=true&authSource=admin.

This URI contains these components:

  • mongodb_user is the name of the user you gave when creating the group in the MongoDB Atlas UI. Alternatively, create a new user in the MongoDB Atlas UI with more restricted privileges.
  • my_password is the password you chose when creating the user in MongoDB Atlas.
  • cluster0-shard-00-00-qfovx.mongodb.net, cluster0-shard-00-01-qfovx.mongodb.net, & cluster0-shard-00-02-qfovx.mongodb.net are the hostnames of the instances in your MongoDB Atlas replica set.
  • 27017 is the default MongoDB port number.
  • mongopop is the name of the database (schema) that MongoPop will use.
  • To enforce over-the-wire encryption, MongoDB Atlas mandates that the ssl option is used.
  • admin is the database that’s being used to store the credentials for mongodb_user.

The remaining fields define the collection to store the documents, the source of the document schema, and the number of documents (in thousands) to be added. The source URL defaults to a document format already defined but you can create your own by registering at the Mockaroo site, defining the document structure and then using the URL provided.

After clicking the “populate” button, MongoPop fetches the data set(s) from Mockaroo and then adds the documents to your MongoDB Atlas collection. Once the data has been added, the page refreshes and you’re shown a sample of the documents now stored in your collection (Figure 4).

Sample of Data Added to MongoDB Atlas Collection

Figure 4: Sample of Data Added to MongoDB Atlas Collection

MongoDB Compass

Congratulations, you now have some data in your database! An optional step is to start exploring that data using MongoDB Compass. The same credentials can be used to connect Compass to your MongoDB database (Figure 5).

Connect MongoDB Compass to MongoDB Atlas

Figure 5: Connect MongoDB Compass to MongoDB Atlas

Once connected, explore the data added to the collection (Figure 6).

Explore MongoDB Atlas Data Using MongoDB Compass

Figure 6: Explore MongoDB Atlas Data Using MongoDB Compass

In this version (1.3) of MongoDB Compass (currently in beta), it is also possible to add, delete, and modify documents (Figure 7).

Modify a Document in MongoDB Compass

Figure 7: Modify a Document in MongoDB Compass

You can verify that the document has really been updated from the MongoDB shell:

Downloading the Application

The tools for deploying your application to AWS Elastic Beanstalk integrate with git, which makes it the best way to get the code. Assuming that git is already installed, downloading the code is simple:

If you then want to refresh your local repository with the latest version:

Alternatively, simply download the zip file.

Testing The Application Locally

Deploying to Elastic Beanstalk is straightforward but there is a delay each time you update and redeploy your application. For that reason, it’s still useful to be able to test and debug locally.

After downloading the application, installing its dependencies and then running it is trivial (this assumes that you already have Node.js installed):

npm_install installs all of the required dependencies (which are described in package.json). npm start starts the application – once it it running browse to http://localhost:3000/pop to try it out.

Deploying to AWS Elastic Beanstalk

You can create your Elastic Beanstalk environment and deploy and monitor your application from the AWS console. If you don’t already have an account then that’s where you would create it. If you already have an account, and a copy of your Access Key ID and Secret Access Key, then using the EB CLI provides a more efficient workflow.

The method for installing the EB CLI varies by platform but if you already have Homebrew installed on OS X then it’s as simple as:

eb init sets default values for Elastic Beanstalk applications created with the EB CLI by prompting you with a series of questions:

eb create creates a new environment and deploys the current application to that environment:

Finally, eb open connects to the MongoPop app from your browser.

If you want to make changes to the application then the EB CLI makes it simple to redeploy the new version. As an example, edit the views/pop.jade file to add an extra paragraph after the title:

The EB CLI integrates with git and so update git with the change and then redeploy:

Personalized Version of MongoPop Deployed to AWS EBS

Figure 8: Personalized Version of MongoPop Deployed to AWS EBS

When you’re finished with the application, the environment can be deleted with a single command:

Note that this doesn’t remove the application deployment files that Elastic Beanstalk keeps in AWS S3 storage. To avoid continuing charges, delete those files through the AWS console (Figure 9).

Remove Deployment Files From AWS S3 Storage

Figure 9: Remove Deployment Files From AWS S3 Storage

Code Highlights

The full code for MongoPop can be found in GitHub but this section presents some snippets that are specific to MongoDB and MongoDB Atlas.

Firstly, constructing the final URI to access the database (from views/pop.js):

Connecting to the database and working with the collection (javascripts/db.js):

All of the dependencies (including the MongoDB Node.js driver) are defined in package.json:

The IP Address Whitelisting Challenge

IP address whitelisting is a key MongoDB Atlas security feature, adding an extra layer to prevent 3rd parties from accessing your data. Clients are prevented from accessing the database unless their IP address has been added to the IP whitelist for your MongoDB Atlas group.

VPC Peering for MongoDB Atlas is under development and will be available soon, offering a simple, robust solution. It will allow the whitelisting of an entire AWS Security Group within the VPC containing your application servers.

If you need to deploy a robust, scalable application before VPC peering becomes available, some extra steps may be required.

In our example application, the public IP address of the AWS EC2 instance running MongoPop was added to the MongoDB Atlas whitelist for the group.

That works fine but what happens if that EC2 instance fails and is rescheduled – its IP Address changes and so it would not be able to connect to MongoDB Atlas until it was whitelisted. That scenario can be remedied by assigning an Elastic IP address (which survives rescheduling) to the EC2 instance using the AWS Console.

What if demand for your application grows and Elastic Beanstalk determines that it needs to add an extra EC2 instance? Again, that instance will have an IP Address that hasn’t yet been added to the MongoDB Atlas whitelist. To cover that scenario (as well as rescheduling), the AWS NAT Gateway service can be used. Figure 10 illustrates a configuration using a NAT Gateway.

Presenting a Single IP Address Using an AWS NAT Gateway

Figure 10: Presenting a Single IP Address Using an AWS NAT Gateway

Two subnets are created within the AWS Virtual Private Cloud (VPC):

  • The public subnet contains the front-end servers which external clients access via an Elastic IP Address attached to the Internet Gateway (IGW). It also contains the NAT Gateway service.
  • The private subnet contains the back-end servers which will access MongoDB Atlas.

Routing tables must be created to route all messages from the private subnet destined for public IP addresses through the NAT Gateway. The NAT Gateway has its own Elastic IP Address which all of the outgoing messages that pass through it appear to originate from – this IP Address must be added to the MongoDB Atlas whitelist.

Messages between the front-end and back-end servers use local IP Addresses and so are routed directly, without passing through the NAT Gateway. Messages from external clients are routed from the IGW to the front-end servers.

Clearly this configuration adds cost and complexity (e.g., the application needs breaking into front and back-end components).

An alternative is to add extra logic to your application so that it automatically adds its IP address to the whitelist using the MongoDB Atlas Public API. If taking that approach, then also consider how to remove redundant IP addresses as the whitelist is limited to 20 entries.

What Next?

While MongoDB Atlas radically simplifies the operation of MongoDB there are still some decisions to take to ensure the best performance and reliability for your application. The MongoDB Atlas Best Practices white paper provides guidance on best practices for deploying, managing, and optimizing the performance of your database with MongoDB Atlas.

The guide outlines considerations for achieving performance at scale with MongoDB Atlas across a number of key dimensions, including instance size selection, application patterns, schema design and indexing, and disk I/O. While this guide is broad in scope, it is not exhaustive. Following the recommendations in the guide will provide a solid foundation for ensuring optimal application performance.

Learn more about the capabilities of MongoDB Atlas and try it out for yourself here.