Archive for Imply

Imply Welcomes Pranav Parekh as Chief Customer Officer 

Posted in Commentary with tags on October 17, 2023 by itnerd

Imply, the company founded by the original creators of Apache Druid, today announced that Pranav Parekh has been named its new Chief Customer Officer, a strategic move aimed at enhancing customer experience and elevating Imply’s commitment to delivering exceptional value to its customers. Parekh will take charge of sales engineering, solution architects, customer success and support, leading an initiative to drive customer-centric innovation and solidify Imply as the industry leader in real-time data analytics.

In this pivotal role as Chief Customer Officer, Parekh will be responsible for reshaping the customer journey, optimizing product and service offerings, and nurturing lasting relationships with Imply’s valued customers. His dedication to enhancing customer experiences aligns perfectly with Imply’s commitment to delivering cutting-edge solutions and unparalleled customer service.

Parekh brings a wealth of experience to his new role at Imply. His career spans more than 25 years, encompassing leadership roles in product management, software development, consulting, sales engineering and customer success at companies including Google, Oracle, BEA Systems, Apigee, DataStax and Akana. Notably, he was a key part of the go-to-market leadership team during Apigee’s successful journey to going public in 2015 and its subsequent acquisition by Google in 2016. Most recently, he led the global field engineering team at DataStax.

Parekh’s appointment underscores Imply’s unwavering dedication to enhancing customer experiences and helping customers realize exceptional value from its products and solutions.

Find out more about Imply herehttps://imply.io/.

Imply Joins the Connect with Confluent Partner Program

Posted in Commentary with tags on July 18, 2023 by itnerd

Imply, the company founded by the original creators of Apache Druid, today announced it has joined the Connect with Confluent partner program, an initiative designed to help organizations accelerate the development of real-time applications through native integrations with Confluent Cloud. This partnership brings together Imply and Confluent’s cloud-managed services for Druid and Apache Kafka, respectively, offering developers:

  • Real-time analytics on streaming data of any scale: Stream millions of events per second from Confluent Cloud to Imply Polaris, the cloud database service for Druid, with subsecond latency—making data instantly available for real-time analytics.
  • A connector-free experience: Ingest data from Confluent Cloud directly into Polaris without installing and managing a connector.
  • Cloud-native, fully-managed real-time architecture: Build real-time applications on Kafka and Druid without the production risk and infrastructure management, while accelerating time to value for real-time analytics use cases.

Organizations now have a simplified experience for analyzing Kafka streams via Druid, the real-time analytics database built for streaming data, as Imply and Confluent together have made it easier to power mission-critical and customer-facing applications with real-time data. These applications are used in a wide range of industries for a variety of use cases, including security and fraud analytics, product analytics, IoT/telemetry analytics, and application observability.

Introducing the Connect with Confluent program

Connect with Confluent gives organizations direct access to Confluent Cloud, the cloud-native and complete data streaming platform that processes more than an exabyte of data per year. It’s now easier for organizations to stream data from anywhere to Druid with a fully managed Kafka service that spans hybrid, multi-cloud, and on-premises environments. In addition, the program supercharges partners’ go-to-market efforts with access to Confluent engineering, sales, and marketing resources. This helps ensure customer success at every stage from onboarding through technical support.

A partnership built from open source

Imply, like Confluent, is founded on a popular open-source technology. Apache Druid is commonly deployed alongside Apache Kafka, the core technology behind Confluent, in leading companies where real-time analytics are a crucial aspect of product offerings, operations, and customer experiences. Today, many of the world’s industry-leading and digital-native organizations—including Netflix, Salesforce,  Citrix, and even Confluent itself—use Kafka and Druid together to gain real-time insights and deliver cutting-edge products.

Together, Confluent and Imply provide a comprehensive, cloud-native platform designed for real-time analytics applications. The integration provides:

  • Effortless scalability: Imply Polaris easily scales data ingestion right alongside Confluent Cloud to jointly handle millions of events per second.
  • A fully-managed experience: Imply Polaris and Confluent Cloud automate key aspects of infrastructure management, ranging from setup to backups and upgrades, providing an effortless, reliable service.
  • Reliability and security: Imply Polaris and Confluent Cloud equip teams with a reliable platform for real-time applications while upholding strict security and compliance requirements.

This collaboration builds upon open-source technologies to empower developers with a cloud-native, real-time architecture purpose-built for analytics on streaming data. Polaris’ native integration with Confluent Cloud gives organizations the opportunity to capitalize on the synergies between Kafka and Druid without the operational complexities and production risk associated with self-managing open-source technologies, accelerating time-to-value for real-time use cases.

Learn More: 

  • Visit the Imply website to learn about the integration with Confluent Cloud
  • Sign up for a free trial of Imply Polaris
  • Read this blog on how to ingest data from Confluent Cloud in Polaris
  • Watch this video to learn what Druid is used for

Imply Announces Full Details of Druid Summit 2022

Posted in Commentary with tags on November 21, 2022 by itnerd

Imply, founded by the creators of Apache Druid, today announced full details for Druid Summit 2022 virtual conferences. Druid Summit 2022 will concurrently serve delegates across the Americas, EMEA and APAC. 

Druid Summit 2022 is a technical conference for a global community of developers building analytics applications. The Summit is aimed at developers, architects and data professionals and provides a forum to share their experience with Druid and network with their peers.

Full event details and registration links can be found by visiting:

This year’s summit features keynote speaker Gwen Shapira, co-founder and CPO of Nile, Matt Armstrong, Head of Engineering – Observability & Data Platform at Confluent, Ben Sykes, Software Engineer at Netflix, Brianna Greenberg, Senior Data Engineer at Reddit, Csaba Kecskemeti, Senior Engineering Manager at ZillowGroup, and Yi Yang, Software Engineer at Pinterest. Also keynoting from Imply are some of the original creators of Apache Druid: Fangjin Yang, co-founder and CEO and Vadim Ogievetsky, co-founder and CPO of Imply.

Offering a wide range of content and activities, Druid Summit 2022 will offer training and education on Druid and its ecosystem. The Summit features talks by industry experts and practitioners covering development methods, architectural patterns, operational best practices and real-world case studies of Druid in production. 

For full event details, please visit: https://druidsummit.org/

Imply Announces $100M Investment Led By Thoma Bravo

Posted in Commentary with tags on May 17, 2022 by itnerd

Imply Data, Inc., the company founded by the original creators of Apache Druid, today announced its $100 million Series D financing, which values the company at $1.1 billion. This investment round was led by Thoma Bravo with participation from OMERS Growth Equity, both new investors. Existing investors Bessemer Venture Funds, Andreessen Horowitz and Khosla Ventures also participated in the financing. This round brings Imply’s total funding raised to date to $215 million as the company accelerates to meet the growing need for modern analytics applications. 

Demand for Imply is driven by an industry evolution in analytics led by software developers. For decades, analytics have been confined to static executive dashboards and reports powered by batch-oriented data warehouses. Increasingly, leading companies are turning to their developers to build analytics applications that deliver interactive data experiences from streaming data and deliver real-time insights to both internal and external users. And developers at 1,000s of companies have turned to Apache Druid, the leading real-time analytics database. 

This new round of funding will enable Imply to accelerate its mission to help developers become the new heroes of analytics.

This funding round is the latest milestone solidifying Imply’s position as the industry leader in this emerging category. It follows the recent product and open source innovation announced in March—specifically, the launch of Imply Polaris, the fully-managed DBaaS built from Apache Druid and the introduction of a new multi-stage query engine that makes Druid the only database to support advanced reports and complex alerts alongside interactive, real-time analytics.

As a leading contributor to Apache Druid, Imply delivers the complete developer experience for Druid as a fully-managed DBaaS (Imply Polaris), hybrid-managed software offering (Imply Enterprise Hybrid) and self-managed software offering (Imply Enterprise). The company builds on the speed and scalability of Apache Druid with committer-driven expertise, effortless operations and flexible deployment to meet developers’ application requirements with ease. Organizations trust Imply’s technology to play a key role in their internally-facing and customer-facing solutions and services.

To learn more:

Imply Announces Polaris

Posted in Commentary with tags on March 1, 2022 by itnerd

Imply, the company founded by the original creators of Apache Druid®, today unveiled at a virtual event the first milestone in Project Shapeshift, the 12-month initiative designed to solve the most pressing issues developers face when building analytics applications. The announcement includes a cloud database service built from Apache Druid and the private preview of a multi-stage query engine for Druid. Together, these innovations show how Imply delivers the most developer-friendly and capable database for analytics applications.

Developers are increasingly at the forefront of analytics innovation, driving an evolution in analytics beyond traditional BI and reporting to modern analytics applications. These applications—fueled by the digitization of businesses—are being built for real-time observability at scale for cloud products and services, next-gen operational visibility for security and IT, revenue-impacting insights and recommendations and for extending analytics to external customers. Apache Druid has been the database-of-choice for analytics applications trusted by developers of 1000+ companies including NetflixConfluent and Salesforce.

As developers turned to Apache Druid to power interactive data experiences on streaming and batch data with limitless scale, Imply saw tremendous opportunity to simplify the end-to-end developer experience and extend the Druid architecture to power more analytics use cases for applications from a single database.  

Real-Time Database as a Service Built from Apache Druid

Building analytics applications involves operational work for software development and engineering teams across deployment, database operations, lifecycle management and ecosystem integration. For databases, cloud database services have become the norm as they remove the burden of infrastructure from cluster sizing to scaling and shift the consumption model to pay-as-you-use. 

Imply Polaris, however, is a cloud database service reimagined from the ground up to simplify the developer experience for analytics applications end-to-end. Much more than cloudifying Apache Druid, Polaris drives automation and intelligence that delivers the performance of Druid without needing expertise, and it provides a complete, integrated experience that simplifies everything from streaming to visualization. Specifically Polaris introduces:

  • Fully-Managed Cloud Service – Developers can build modern analytics applications without needing to think about the underlying infrastructure. No more sizing and planning required to deploy and scale the database. Developers can start ingesting data and building applications in just a few minutes.
  • Database Optimization – Developers get all the performance of Druid they need without turning knobs. The service automates configurations and tuning parameters and includes built-in performance monitoring that ensures the database is optimized for every query in the application. 
  • Single Development Experience – Developers get a seamless, integrated experience to build analytics applications. A built-in, push-based streaming service via Confluent Cloud and visualization engine integrated into a single UI makes it simple to connect to data sources and build rich, interactive applications. 

Evolving the Druid Architecture

From its inception, Druid has uniquely enabled developers to build highly interactive and concurrent applications at scale, powered by a query engine built for always-on applications with sub-second performance at TB to PB+ scale. Increasingly, however, developers need data exports, reporting and advanced alerting included with their applications, requiring additional data processing systems to deploy and manage.

Today, Imply introduces a private preview of a multi-stage query engine, a technical evolution for Druid that reinforces its leadership as the most capable database for analytics applications. The multi-stage query engine—in conjunction with the core Druid query engine—will extend Druid beyond interactivity to support the following new use cases in a single database platform:

  • Druid for Reporting – Improved ability to handle long-running, heavyweight queries to give developers a single database for powering applications that require both interactivity and complex reports or data exports. Cost-control capabilities make these heavyweight queries affordable.
  • Druid for Alerting – Building on Druid’s longstanding capability to combine streaming and historical data, the multi-stage query engine enables alerting across a large number of entities with complex conditions at scale.
  • Simplified and More Capable Ingestion – Druid has always provided very high concurrency—very fast queries across large data sets. Using the same SQL language that Druid already supports for queries, the new multi-stage query engine enables simplified ingestion from object stores, including HDFS, Amazon S3, Azure Blob and Google GCS with in-database transformation, making data ingestion easy without giving up any of Druid’s power to enable interactive conversations in modern data analytics applications. 

Learn More

Product Availability:

Imply Polaris is Generally Available and can be accessed via imply.io/polaris-signup

The new multi-stage query engine is in private preview and can be requested via contact@imply.io 

Guest Post: As Data Analytics Evolves, We Need to Get Real (Time)

Posted in Commentary with tags on February 14, 2022 by itnerd

By Darin Briskman, Director of Technical Marketing for Imply

We like data! We also like thinking about how to use data to get the insights we crave to accelerate our success – improving health outcomes, getting the right products quickly to the people who need them, increasing opportunity and equity, understanding risks, helping people find the music and games they want, and the millions of other fun and cool things we can do with data.

After over 30 years of working with data analytics, we’ve been witness (and sometimes participant) to three major shifts in how we find insights from data – and now we’re looking at the fourth.

The first shift – Going to CRUD

In the beginning, Codd created the database. And he looked upon it and saw that it was CRUD.  

It wasn’t really the beginning, of course. There had been databases for a few decades, using hierarchical and network models that were focused on automating legacy processes that had been done using pens, paper, and mechanical calculators. But when IBM’s Dr. Ted Codd published “A Relational Model of Data for Large Shared Data Banks” in 1970 it kicked off a new era for data, with relational databases as the basis of a data revolution in the 1980s and 1990s, defining the tables with rows and columns that we all use today.

Another group at IBM developed SQL, which made getting data into databases and out of databases much easier. An explosion of relational databases followed, as groups around the world used SQL with Oracle, DB2, Sybase, Ingres, and too many other relational databases to name.

At its core, relational SQL is CRUD: tools to Create, Read, Update, and Delete data. It’s a brilliant approach to make large data sets practical at a time when compute and storage were very expensive – in 1983, when Oracle made its first sale (to the Central Intelligence Agency), a GB of storage cost about $500,000 (in 1983 dollars – that’s about $1.4m today), while a GB of memory cost about $2m ($5.6m today).

To control these costs, CRUD gained a collection of tools to store data more efficiently by breaking data in lots and lots of smaller tables which Dr. Codd named normalization (why? A big news story of the 70s was the US “normalizing” its relationship with China; Codd figured that if Nixon could normalize China, he could normalize data). This added complexity to data management, which means more developer time to work with data. But when a GB of storage is the same price as 5 person-years of developer time, the complexity was considered well worth the price.

Highly normalized CRUD is great for transactions, where you need to input data fast and get answers to simple questions, like “what’s the status of order #8675309?”. As more data became available, people wanted to ask more complex questions, like “what are my 10 most profitable products and how has that changed over the last 8 quarters?”. The answer: analytical databases.

Analytics requires data stored in an analytics-friendly format, with the data at least partially de-normalized (fewer, bigger data tables). It became clear that using the same dataset for both transactions and analytics would make both work poorly, so early analytics started by using a second copy of the data on a second installation of the database software. 

The second shift – CRUDdy Appliances

As analytics became more complex, we saw the rise of appliances – dedicated data warehousing hardware + software from Teradata, Netezza, Greenplum, and others. It was still all relational CRUD, with whole new categories of software created to extract data transactional systems (finance, human resources, shipping, supply chain, sales, and such), transform it to a different CRUD schema that is friendly for analytics, and load it into analytic databases, using software from Informatica, IBM, and others. We also saw the rise of business intelligence tools to turn data into pictures and reports that humans can more easily use, like Hyperion, Business Objects, Cognos, and Microstrategy.

This whole data ecosystem was disrupted and reformed, first by the Internet. The Internet radically increased the amount of data created and used. In 1995, a “big application” might be an SAP system with 5,000 users, and a 1TB data warehouse was considered huge. By 2005, “big applications” like Google search, Amazon commerce, and Facebook had millions of users. Pushing this much data through a CRUD pipeline was both too expensive and ineffective. Something new was needed.

The third shift – CRUD in the Cloud

A new generation of analytics databases arose to deal with larger datasets, like Aster Data, Vertica, and ParAccel. As this new generation entered the market, many believed that they would displace data warehousing as we knew it, connecting the new realities of our internet age with the CRUDdy infrastructure of the past. Little did these technologies know that the new realities of the internet age were going to bring about a change that would disrupt their very foundations. The internet brought home a new friend to meet the parents: the Cloud. Life with data changed again. 

With effectively unlimited cheap computing power and cheap storage on-demand from Amazon Web Services at first and soon from Microsoft Azure, Google Cloud, and many others, It was now possible to re-design and re-create how to approach analytics. One of the clearest stories of just how much deployment and operations in the cloud was transformational to these databases, we can look at ParAccel. As a technology, it was one of the newcomers in this generation, but was struggling in the marketplace. Then, they formed a partnership with AWS, which took the ParAccel technology and offered it as a service known as Redshift. Redshift took off, opening the door for other cloud-native data warehouses like Google BigQuery and Snowflake, offering high scalability, combined with new cloud-focused data pipeline tools (like Fivetran and Matillion) and business intelligence tools (Looker, Tableau, Domo, Apache Superset, and others) to redefine the data warehouse. 

Of course, Cloud Computing also powered the rapid growth of applications, as not just Internet giants but a wide range of businesses and governments found themselves operating applications with millions or dozens of millions of users. Pushing this much data through a CRUDdy pipeline just takes too long and costs too much.

As we entered the 2010s, data engineers were struggling with this problem. How can we have interactive conversations with high-volume data? The data streams in from the Internet and other applications – why not just analyze the data stream instead of converting it all to relational CRUD?

The need for a Modern Database

We can find a great example of how this shift to powering analytical applications shows up in the real world by looking at Reddit. They explain in a blog post (https://www.redditinc.com/blog/scaling-reporting-at-reddit/) how they need to expose direct insights into the effectiveness of their advertising and just couldn’t do it without new database options.

Reddit advertisements generate tens of gigabytes of event data per hour. To let advertisers both understand their impact and decide how to target their spending, Reddit needs to enable interactive queries across the last six months of data – hundreds of billions of raw events to sift through!

Reddit needs to empower advertisers to see groups and sizes in real time, adjusting based on interests (sports, art, movies, technology …) and locations to find how many redittors fit their target.

(There’s a detailed explanation of this on YouTube)

Here we see the changes required by modern analytics applications: large numbers of people (in this case, advertisers) conducting interactive conversations with large, fast moving data sets that combine new data from streams with historical data.

The fourth shift – Beyond CRUD

As you might have noticed from the Reddit example, there is a new database hiding in this solution: Druid.

As the need to stream analytics emerged, several projects tried different approaches to make it work. One advertising technology company needed a database that could combine stream analytics (for high-volume incoming data) with historical data (stored as relational CRUD) and found that every existing technology was either too slow, not scalable enough, or too expensive for their needs. Since they needed a database that could shift “shape” to address both streaming and historic data, and they had grown up playing Dungeons & Dragons, the new database was named after the D&D druid, a sort of shapeshifting magician. 

Druid became an open-source project under the Apache Foundation in 2013, and was quickly adopted by a wide range of people looking to analyze streams or a combination of stream data and historical data. Druid became a leader in the field of real-time databases, and, over time, several companies were created to help developers use Apache Druid®, led by Imply Data, founded by Druid’s co-creators.

To make something like this work, you need sub second response times for questions from billions of data points, some in streams and some in historical datasets. Concurrency is also paramount, as there may be dozens or hundreds or more people asking questions of the data at the same time. And, of course, it needs to be done on a budget, where value delivered greatly outweighs the cost of operation.

While storage and computing still cost money, in modern development they are far far smaller than the cost of developer time – compute power is now a few dollars per hour, while object storage costs $23/TB per month or less. Meanwhile, the full-laden cost of a US developer, including salary, benefits, equipment, and management, is $55 – $80 per hour. Developer time (and, once the application is deployed, similar costs for administrators to operate it) are by far the greatest expense. In modern economics, if you spend an hour of a single developer’s or administrator’s time a day to save a TB of storage, you are losing over $100,000 per year.

The Path Forward – Still some CRUD, but also Modernity

We have entered a new age, and CRUD is no longer enough.

There are still good uses for analytics with relational CRUD. Most organizations still need annual and quarterly reporting, if only to meet regulatory requirements. This sort of “not real time” reporting works well with CRUD.

For teams to have meaningful interactive conversations with data, modern real-time databases are key. It just takes too long and costs too much to push all the data through the CRUD data pipeline. 

If you are a developer or a professional with an interest in data, I strongly suggest you take a look at the real-time databases now available. For me, the one with the best combination of support and capabilities is Imply Enterprise, which is built using Apache Druid, adding technical support and packages for easy deployment to automate scaling and operations. But whatever you choose, be ready to take your team beyond CRUD and embrace modernity!

Guest Post: The Rise of a New Analytics Hero in 2022 

Posted in Commentary with tags on January 5, 2022 by itnerd

By David Wang, Vice President, Product Marketing, Imply 

The Rise of a New Analytics Hero in 2022 
Every year industry pundits predict data and analytics becoming more valuable the following year. But this doesn’t take a crystal ball to predict. There’s actually something much more interesting happening that’s going to change everything in the analytics world: the rise of a new hero, the software developer.  
If the past is any indication of the future, then what we are seeing is a major transformation unfolding across every industry: a changing of the guard, so to speak, of the ones who are creating value from data.

Today, the industry at large equates analytics with data warehousing and business intelligence.  It’s a traditional approach of BI experts querying historical data “once in a while” for the executive dashboards and reports that have been around for decades.  

But for bleeding-edge companies like Netflix, Target, and Salesforce, their use of analytics is much more progressive – and much more impactful and real-time. Companies like these see the true game-changer for data in the hands of their software developers.  

Their developers are building modern analytics applications and doing it with Apache Druid to deliver interactive data experiences for investigative, operational, and customer-facing insights. 
But what’s causing the emergence of these apps and what’s it mean for developers?  

Let’s break down the Top 5 reasons:

#1 The need for interactive analytics at scale is taking off

Increasingly, analytics are needed to understand a situation or investigate a problem. This requires the freedom to slice and dice and interact with data live with sub-second query response at any scale. It’s a dynamic user experience that can be best created via a developer-built application.
No one wants to sit around waiting for a query to process.  And while many databases will claim the checkbox for interactivity and speed, they’ll come with lots of scale constraints.  They’ll rely on tricks like roll ups, aggregations, or recent data only to make queries appear faster, but that just restricts the insights you can actually get. So the operative word here is “scale”. 

#2 High concurrency is becoming a must-have for every use case

The days of relying on a few BI analysts to write SQL queries are seemingly in the rear-view.  Data-driven companies today want to give everyone – from product managers to ops teams to data scientists – free access to explore. And, multi-tenancy takes user count even further. But concurrency doesn’t just come from the numbers of users. Developers are being asked to build analytics apps with dozens of visualizations with each firing off several concurrent SQL queries.  

Now I’ll admit – it’ll be hard to find a modern database today that doesn’t claim high concurrency.  You obviously wouldn’t want to force fit Postgres (or even Elastic) in uncomfortable positions. But what about scale-out cloud data warehouses?  Doesn’t elasticity = scale = high concurrency? Of course, but elasticity without insane compute efficiency (like with Apache Druid) is going to be a really expensive app.

#3 Desire to unlock the value of streaming data with analytics

Businesses of all kinds are rapidly adopting event-streaming platforms like Apache Kafka. Our friends at Confluent, the creators of Kafka, have built a data mesh that puts data ‘in motion’. With data swirling around constantly, what better use of it than to analyze it for continuous, real-time insights? 
Companies like Netflix are doing this and their developers are creating a huge competitive advantage by bringing together Apache Kafka and Druid to build an analytics app that enables a high quality, always-on, user experience.  

With an eye on real-time analytics, several things have to be taken into account.  Is analyzing streams alone enough – or does the use case need to compare streams against historical data?  For Intercontinental Exchange, it’s the full spectrum from present to past that gives them the right security visibility. Does ingestion scalability matter – do you need to process millions of events per second? What about latency or data quality?

#4 More and more companies want to give their customers analytics

Analytics of the past were about making better decisions for the business.  While still very relevant – and a huge opp to create more value – we are increasingly seeing companies build analytics apps to deliver insights to their customers. 

Companies like Twitter, Cisco ThousandEyes, and Citrix are doing this and driving material revenue.  They’re giving their customers visibility and insights – and that in turn creates big business for them. 
But it can be a pretty hairy outcome to use any database to build a customer-facing analytics app. There’s way more on the line than internal use cases when you think about SLAs and the customer experience. It’s in these apps where microseconds of latency makes a difference, downtime is costly, and concurrency and $$ goes through the roof. Thankfully there’s a database for that!

#5 The digitization of everything is built with analytics 
At this point in tech, I think we all see that every company is becoming a software company. But with everyone having easy access to the cloud, simply building cloud software and services isn’t enough to sustain an advantage. That’s why companies like Salesforce and AirBnB build analytics apps to optimize how they build their products.  

Developers there – and at the best software companies – are building analytics apps to help them create the best product experiences.  Whether it’s next-gen observability, user behavior insights, live A/B testing, or even recommendation engines, an analytics app is at work.

The crystal ball for 2022

There you have it. Our prediction for this year. We see the world of analytics expanding rapidly to modern analytics apps – with developers becoming the new analytics heroes in organizations.  

Here’s to 2022!