Dan Rosanova shares his experience on how to choose the right data handler for different types of data. He focuses on Azure Event Grid, Azure Event Hub and discusses their features and functionalities in relation to data processing.
Integrate 2018, June 4-6, etc.venues, London
I’m going to introduce Dan Rosanova, who is a principal program manager at Microsoft. And Dan is going to be talking about the Reactive Cloud Azure Event Grid. Over to you, Dan.
Dan: Thank you. I think this is my fifth time here. Today is my 1,491st day at Microsoft so little anniversary for me. And, in the past, I used to talk about BizTalk here but I will not today. And I’m actually going to change topic a little bit to follow up from Clemens’s [SP] session we’re going to talk a little bit about eventing and streaming with Azure and, sort of, what these are and what we think about them so this won’t just be a Grid talk. But when we look at what’s happening in the Cloud, in the world actually, like, what’s going on and why is there so much data right now?
Some stats out here that are pretty impressive are that we actually, 10 days or so ago, a week ago, we passed 2 trillion requests per day in Azure Event Hubs globally in the public cloud. And if you look at some of these other, these other stats, things like there are 269 billion emails sent per day, I’m not sure how many of those are real compared to spam, but there’s a lot. And then, what I like a lot actually is the 60 billion gigabytes of data created every single day. So, 60 billion gigabytes, that’s a huge amount of data.
So, these are just some kind of stats to give you an idea of what’s going on. Only one of these is in our cloud, but in the world, in general, and why is messaging and inventing and streaming such a big deal now. If we look at how big data came to be, it really is the confluence or the joining of these things, of the fact that there’s a lot of data in the world being created every day that it covers a lot of different types so that information is structured sometimes and sometimes it’s not.
Some of it are streams, I’ll talk a little bit more about what I mean by streams, some of it are messages, which I think Clemens did a very good job describing. Some of them are files, like, who in here is doing integration with files today? Anyone? Okay. I am, so, I mean, I would guess most people are. Some of the information, some of that 60 billion gigabytes a day is immediately useful, and some of it’s not, some of it needs some work, some refinement, to be useful, and some of it just never will be useful. And figuring out what of that is important is a big task that you’ll be facing, as people doing integration.
And when you look at the tools that people are using to do integration, this is a pretty diverse set of tools, just some I put on this slide just for fun this morning. Actually, I wasn’t going to put BizTalk on here and now I know I did. So, I usually put all sorts of fun pictures of BizTalk in my presentations but I’m not going to this year. So, just a few of these that are interesting though is that who is using any of these to do integration today? Who’s using BizTalk? Anyone using other things than Azure? Okay. Yeah. Anyone using Kafka? Okay, good. Not good, but we’ll see.
So, even within Azure, we have a lot of tools to do integration with. And I just rattled these ones off in my hotel room last night about tools I’ve used in Azure, for better or worse, to do integration with and we’ll talk about some of these and what they’re good for integration. So, I would imagine these are some of the tools that you’re probably looking at in Azure as well. And then, you look at this list, and there’s more in this, this is just a short list, and the first thing you’re going to think to yourself is, “Well, which one’s the right tool?” Right? “How do I know which is the right one?” Which was a challenge my dog faced all the time. These are his toys.
So, when we look at the segmentation of messaging in the cloud market, this is a slide that I usually don’t show publicly which talks about simple queuing, which would be storage queues, eventing and PubSub, which would be Grid, data streaming, which would be Event Hubs, and enterprise messaging, which would be Service Bus. We definitely see these as different, they are designed for a different purpose. There are things you care about for each of these and things you don’t care about or things you’re willing to give up to get that.
So, Event Hubs is a great example, people really want something that they’re thinking in terms of streams, they’re thinking in terms of megabytes or gigabytes of data, they’re not thinking in terms of individual messages usually. So, they’re willing to give up some things for that like server-side cursor or once and only once, you know, because those are trade-offs you give, there’s no one super tool here. And when I look at how these are broken down into, sort of, the ecosystem that they live in, I see server-less big data and enterprise, enterprise solutions, really sort of driving these.
And I think us adding Kafka to Event Hubs made it much more clearer for what it was, you know. Event Hubs is pretty much Kafka, we looked at Kafka and hosted Kafka, and we went and made Event Hubs four or five years ago to have an alternative to try to make Kafka work as a multi-tenant cloud platform, but we were able to, kind of, make that fit in. And when you look at something like Event Hubs and call it Kafka, it makes much more sense of why we have IoT Hub and Event Hub because they’re really three different things.
And so, you do get people that ask, though, “Which one is the right tool?” Or, “Which ones the best tool?” That’s my favorite one, especially because we manage all of these. Every new service people think, “Oh, should I stop using Service Bus because Grid is out now?” Maybe, but it’s like this. So, which of these is the best to eat with? That’s a very much a context-driven question, right? I mean, I might have my favorite, or maybe my one I go to by default, but sometimes, it’s really easy to decide what’s the best tool to use. This is a case where it’s pretty easy to know to use the spoon.
Sometimes, it’s not necessarily as clear, you might need more than one tool for this. Sometimes, it’s just confusing and you just don’t know, like, how, what is that? And sometimes, you do need more than one tool. And then, there are other times when you need a specialized tool that you weren’t really planning on. And then, you get to a place, though, where you still get people that say, “I just want one tool, make my life easy.” In reality, we’re not making your life easy when we combine tools, we’re making your life more difficult. Has anyone tried to eat a steak with a spork? I would challenge anyone to try that. So, the spork is a suboptimal tool that’s combining two very well-designed tools in a very poor package.
So now, I wanna talk a little bit about some of these tools. The two I’m going to talk about today are Event Grid and Event Hub, in some detail. So, looking at Grid, some of this you saw from Clemens already, the whole idea here is that it’s this high-level PubSub sort of thing. And I shouldn’t say “sort of thing,” it’s this high-level PubSub platform that exists all across Azure. We designed it, really, to serve the serverless space so it’s really actually hard to make serverless work without some sort of eventing infrastructure behind it.
And one of our team members, a VP on Azure, used to say that functions doesn’t function without Grid. And it’s your execution, like your function piece, really is like in code, it’s the on click, on button click, and you need some way to fire that on button click. And this is what Grid is meant to do in the serverless landscape. In the ops automation landscape, it’s really designed for something else, which is similar, it’s reacting to events, but it’s a different type of reacting to events, and that’s really for more people who are responsible for operations of systems and platforms.
They wanna be notified about changes, they wanna be notified about deployments, about scale events, about all sorts of things. So, you might wanna tag VMs as they’re created, just get notifications any time a VM is created, you might want to do that for, like, storage counts. You run your own policies and rules to say, “Does this meet my requirements of where I expect my storage to be and does it have the features like encryption turned on that I expect it to have?”
And then, the last one is really third-party integration. And this is a place where you can extend your applications to actually raise their own events and then do eventing between applications. So, if you have something like CRM and you wanna fire things on Azure based on your CRM or even from on systems. At a very high level, Grid is pretty simple. There are five concepts in total. Five is actually too much, I should probably cut this down to three or four. As a former consultant, pretty much every slide I do has between two and five bullet points on it, internally Microsoft, and the right answer is always three.
So, the first concept is what happened? That’s the event, what happened? So, a file was created in storage. The second is the publisher involved, so storage is that publisher, “Where did this take place?” Well, the event took place in Azure storage. Next is the topic, which usually in Grid you only, sort of, see by proxy, you’ll see what I mean in a second. So, that is where publishers, kind of, express or push their events. And the next, the fourth, is subscriptions. How do you express your intent to receive events or your interest in specific events? That’s with the subscription. And the fifth are the handlers, where do those events go?
So, if we back up a little bit and look at actual event, what does an event look like? It looks like this. Is anyone familiar with Event Grid? Has anyone heard of it before? That should be every hand because Clemens mentioned it. But anyone using it? Okay. So, this is an event. Let me see if you can see that okay on that screen, yeah, it’s okay. The pieces of interest on here are really topic, subject, and event type, that is how you are expressing your intent, something you are interested in. You say, “I’m interested in things from this topic and this event type,” and you can do prefix matching on the subject.
Any event that comes, the data payload, the second, the lower part, is actually driven by that event type in the top. So, what’s in that data JSON changes? And you’ll notice this is a storage event. So, that topic is actually the resource manager path to a storage account in Azure, the subject is the path within that storage account to a specific blob, and this is a blob created event, and then the data tells you where this blob was created, there’s a URL for it, it tells you which API created it, tells you the content length and some other interesting stuff about it. So, this is really meta data about the storage events that happened. So, this is the storage event, not the storage file.
When you look at core concepts of events, they are individual. So, they are unrelated to each other just like in the user interface, there is no order in events in the user interface, that would make a very frustrating experience. They’re individually significant, Clemens did a good job talking about this, they happen even if no one is listening, and they are subscriber focused. And this makes Grid different than Service Bus is in Service Bus, the publisher knows quite a bit about the message, they’re making choices about lifetime, about what goes in the header to do routing, things like that. And in Grid, the model is turned around completely and it’s all subscriber focused.
What we’re really trying to do is bring this on click or event model that was big in C# to the cloud, which is why we use the same icon as Visual Studio did. So, a quick “hello, world” of Event Grid, and I almost never live demo, I think, in my five years doing this, this is the first time I’ll live demo anything here, so wish me luck. Yeah. If we look at a storage count, this is an Azure storage count, how does Grid work, right? So, Grid is bolted on to everything in Azure so you really don’t have to think about Grid itself, you just have to think about events, which is why we’ve made this cute little icon on storage counts and quite a few other, all the things that were on the left side of that diagram that you saw for publishers.
So, if I go into here, and I click on “Events,” I can see who is subscribed to the storage count, I can see what’s in here, I can see there is one thing subscribed, there’s two, there’s a web app and a hybrid connection, which I’ll talk about in a second, and now I’m going to open up a CLI, make sure this thing’s actually still working, cool, and I’m going to go create a subscription on an event. First, I’ll show you CLI tool for Grid, and if I go look at just running that command to get help on the CLI for Grid, I can see some new stuff that’s in preview now. You can see this says, “This extension is in preview,” this is stuff we haven’t really announced yet completely, but it is here.
So, some of things you’ll notice is that there’s a dead letter endpoint you can specify now. So, that’s kind of a new. Another one is that, for your end point type, you can see some new stuff like hybrid connection and storage queue, so now you can send your events to a storage queue instead of just an Event Hub or a WebHook. WebHook is the default for Grid, it’s always a push model. There are some other fun things in here like TTL, which, actually, I don’t think is supported but we let you do the max delivery at times. I think that one’s on today.
So, what I’m going to do now is I have a queue in a storage account and I’m going to go create a subscription to it for Grid right now. And I put that in a variable to make my life easier so I don’t have to cut and paste a bunch of crazy stuff. Oh, that’s right, I remember this. And because it’s a variable, I have to make sure I have a space in here. Okay. So, this is going to go create a subscription for that storage queue. You can see in the command here, I’ve got endpoint type=storage queue. And I put this arm path in my storage account in there and then the path the individual queue that I wanna go see.
And now, it’s created, everything’s hunky dory, looking nice. And so, what I can do now, is I can actually come into my storage account, because my event subscription’s on there, and go and add a file, upload a file into that blob, over here, upload something. I’ll just pick one of these files I’ve been playing with, I don’t even know what’s in these JSON files but I’m going to upload them anyway, overwrite. And so, this is uploaded now and what I can do is go find my queue wherever I put that queue. And here we go, the queue should be over here, and go look at what’s in that storage account.
So, if I come here…not in that storage account, oops. If I can find which storage account I put that in…you know what, I’ll make this easier on all of us and I’m just going to skip to the next one, which is the WebSocket test. So, the same thing that Clemens mentioned is that we can do this all through WebSocket. And the WebSocket thing that he showed is pretty cool. You use this node library, it’s very easy, but we can actually host the same stuff through something even more simple than that. And right here, what I’m going to show you is the code for this HTML page that I have got, that I just loaded.
I’m going to upload one more file right now just so we can look at it and watch this happen. And you know what, I wanna break this window out so you can see it happen live because that’ll be cooler. I’ll move back in time. So, if I go find another file that I wanna upload because this is…I mean, I’m behind a firewall here, you know, my computer has a firewall on so it’s not open directly to the internet. And I upload that file and what I’m going to get here is the push directly to my browser of the event that just happened. And you can see exactly what it was, you can see the name of the file, it is tmp2.json.
And what’s happening is I’ve plugged this browser directly into a Grid. So now, I can do things like what we’re working on, I don’t know if it’s shipped yet, it’s for Visual Studio code to be able to debug right in your functions, on your machine reacting to what’s happening in Grid So, we’re getting a pretty rich experience now, things are coming along pretty well. This is kind of this eventing piece that’s pretty cool, and events are interesting because there are things you’re reacting to. It’s not like, “Oh, someone placed an order,” I guess you could do that, but it’s literally “Someone changed the environment or someone did something in storage some way,” or in any other service that’s onboarding now.
What I wanna do real quick is go back to this and then walk through now about…I’ll leave some time for questions at the end, but if you have any just shout them out as we go on, I’m not opposed to interruptions. So that’s like a “hello, world” from Grid. And now, I wanna do a quick introduction to Event Hubs. Who has used Event Hubs in this room? A few people, okay. Who’s heard of Event Hubs in this room? Okay. It should be everyone now, I’ve already said it. So, sorry, I’ve taken some cold medicine, and it’s starting to kick in, so this will get better and better as we go.
So, Event Hubs is service we have that’s very similar to Kafka, it’s a telemetry, it’s a streaming service, really, and that means it’s different than a queue and our intent is different than a queue. I hope there was a hint of that because it was on that previous slide. Wait, that slide like 10 minutes ago had Event Hubs on it, everyone should have said they’ve heard of it. So, how is a stream different than a queue? A stream is different from a queue, yeah, it’s actually as Clemens mentioned, because it’s like a tape. So, you think about what a tape is.
A tape is something that’s recording and moving forward only. It’s not like a queue or stack or something that you’re putting stuff in and taking stuff out of, you are just writing to it. You can play this tape over and over again as many times as you want and so it works a lot like a cassette tape. Does anyone remember cassette tapes? Okay, cool. I keep waiting for the talk where people are just going to be like, “What is that?” I came close once. And just like playing back a cassette, playing back a stream is the same way, and that doesn’t matter if it’s a Kafka stream, an Event Hubs stream, or any of the other services out there that give you a partition log or streaming model.
When you’re playing on a cassette tape, on a tape deck, you can play from anywhere on the tape. You just fast forward and rewind to the point you want on the tape and then you press play, and you can play it as many times as you like, ignoring the degradation that happens on cassettes. But like a tape, all tapes have a length, and for us, what that means is we’re basically looping the tape at the end. So, after a day, we’re going to overwrite the old data for the most part, so that’s the how this particular tape works.
And to push this analogy a little bit further, when you look at streams, you think about partition consumers, something you’ll hear in the Kafka world or in our world, and channels or shards in some other places, a cassette actually has left and right audio channels. It’s why you hear music differently from the left side the right side. And if you listen to, like, classic rock, you’ll hear a guitar on one side and drums on the other side, which was really so the record won’t skip. But that’s a good demonstration of the fact that when you press record, you’re recording both these channels but the data on each channel is actually different. So, you have multiple channels, the data is different, we call these partitions and Event Hubs.
And on your stereo, the left or right speakers are each playing back one of those channels so they’re playing back different stuff. And that’s how partition consumer model works. This is how I like to talk about Event Hubs to some extent and this is a more architectural view is that an Event Hub is really just a an abstract construct, it’s just something that exists there of kind of as a…too many people use the word container now, but I’ll just say it’s like a high-level construct. When you’re sending events to that Event Hub, that Event Hub has partitions, those partitions go and live on other servers that we own, those partitions within them have data that’s in order and in a repeatable order. Just like the tapes, you can play it over again.
So, this would be a four-channel tape instead of the two-channel tape, which to bring this analogy full circle, a cassette actually has four channels on it because it’s got two sides and stereo on both. So, when you are…I did start my university career in audio engineering, I think so. So, when you’re sending events to an Event Hub, you get a choice of protocols, there’s actually more you can do here on Kafka, and you are just sending data in. And what we’re going to do, unless you take steps to tell us otherwise, so we’re going to round robin that data across all the partitions to give you really good throughput, it’s going to give you a really good availability.
If for some reason, we’re, like, doing an upgrade on one of those servers that’s hosting that partition and it’s not available for a few seconds, which is all it ever is, your sends will just jump to the next server automatically because we know that and we know what’s going through upgrades. So, this is the default behavior. You can also do things like use keys to pin data to a specific partition. So, I can tag, when I’m sending events, tag an ID on there that’s Dan, or my full name would be a better one. And then, what’s going to happen is we’re going to hash that to one of the partitions. And so, Dan will always hash to partition two, and so all Dan’s data is in partition two. And this will be more important when we look at how you read from Event Hubs.
But the thing to know there is that I’m making a dependency on the availability of a specific partition but I’m getting localization of data for, which is important when you’re reading. Because when you’re reading, to receive from this Event Hub, you need a reader for each channel that you have, for each partition you have. So, to read the whole stream, you need to connect to each partition. We give you tools to make that very easy, but fundamentally, that’s what’s happening. Each connection is established and it’s giving a client-side cursor when it connects to say, “I wanna read from location zero on partition one,” or, “I wanna read from location 100 on partition 2,” and that’s where the tape forwards to and plays forward. So, that’s how that all works.
You can have a lot of readers. So, you need one per partition but then you can have different sets of readers all connected in parallel and independently at different places in the tape. We call these consumer groups, so does Apache Kafka. Coincidence? I think not. Some people would call this model PubSub, I do not just because you can listen to the events twice, there is no filtering, there’s no broadcast, I don’t call this PubSub. But it is interesting and it’s super, super powerful. And thinking about big data streaming, or data streaming with Event Hubs, how does this look?
Well, when you think about the data flowing around in your organization, really, much to our chagrin for everyone in this room probably, business people tend to really only care about this side, right? The far side, so the presentation in action. Like, how much stuff did I sell yesterday? What was the slowest delivery that I had to make? Where are things are breaking down? So, this is kind of where the value starts from a business side and this is where the information starts from like a real world side is you have producers of events or of telemetry, they can be applications, they can be devices, they can be servers, all sorts of things. And then, what you are doing, your job in integration is really to make those two ends magically meet together.
So, no one usually asks what’s in the middle, unless it breaks, but this is kind of how you design those pipelines whether on Azure or somewhere else, is that you’re doing some form of collection or direct streaming into the cloud, you have to land this data somewhere so you can process it in the time that you’re able to. You do stream processing on it, whether that’s fast or slow, it could just be file-based processing, which is very popular, or it could be real-time processing with things like stream analytics. And then you’re moving to some sort of long-term storage and then to the presentation that actually makes business people happy. So, that’s kind of how we put all these pieces together.
I’m going to show you real quick “hello, world” in Event Hubs, how that looks for us, I’ll go into the portal over here, and go into…Surprise, surprise. I’m also going to show you Kafka, which is why this resource group is called Dan’s Kafka. So, if I go look inside of this resource group…let me close this…and go take a look at my Event Hub I have in here…here we go, “No resources to display,” great. I’ll cheat and go through here. Here we go. I have an Event Hub, this one’s in West U.S. Actually, as of today, unfortunately, it’s today, North America time, I was hoping to show it today, Kafka is going to be available in Europe as well and in Central U.S. so that’s our big news for today.
And if I go look at an Event Hub I happen to have, here is one, it’s pretty easy plus new in the portal, it’s just got a simple topic on here. And if I wanna go see how I talk to this thing, I can actually take the connection string off of it from these shared policies, I’m not going to do that right now, and if I look at it in code, so this is code to send to that Event Hub. This is pretty easy code, it’s just creating a client from a connection string, and then it’s going to do a for loop and send a bunch of stuff. This happens to be ticker symbols or bid-ask spreads for trading if you’re into that sort of thing, and this is pretty easy. Again, I’m not doing any sort of key stuff, it’s just literally for and send.
On the receive side, we give you this tool event processor host that most people, I think in the past at least, used to read from their Event Hubs. It’s really built around this idea of an event processor, you make a class that implements this interface, i.e. my processor, and it gives you four methods. The four methods are close, open, process errors, and process events. You can guess what those all do, I’m not going to take time to explain them. But this one, I’m just going to write these out. So, if I start this up, just press start, we need two counsel [SP] windows, a consumer and a producer, and you can see my producer sending these events all the way to California and it is done already.
And then, in a second, my consumer starts up because it’s getting leases on these partitions and you can see that it’s reading all of these events now as we speak. And so, that is super simple, 100-level Event Hubs running. And if I go now look at something a little more interesting, I can actually see…Oh, I guess I’ll walk through this first and then I’ll do something more interesting. So, the concepts of this are different than a stream. You get that repeatable read, stable order, so you can read the same stream like four hours later and you’re going to get the same results if you’re doing calculations. That’s the idea behind this.
They can contain order, you usually partition streams to scale, the idea is not to make them scale up, it’s to make them scale out, and they are well suited for structured or unstructured data and you might actually put events in here. And this is an important point about where streams and events meet, is that you can subscribe to events with a log. So, you can actually create an Event Grid subscription that is in Event Hub so you can be logging all the events from, like, a storage account or from your entire Azure subscription. You can produce events from a stream as well.
So, if you’re reading a stream and you’re looking for, like, changes in the stock stuff, if you’re seeing the bid-ask spread invert, that’s probably…you could literally make money off that. So, you probably wanna raise events about that. And a stream processor and a complex event processor are two different things, they’re not exactly the same so keep those in mind about where these meet and how these services all compose. The idea here is that you can have a topic like your storage account and sometimes you want other messaging services to be those subscribers, like storage queues, which is available now in Event Hubs, which has been available for quite a while, and your Grid messages will go there.
Sometimes, you want a WebHook, which is sort of the default experience for Grid and for functions with Grid, and sometimes, you want to queue because these things have different semantics, they work different ways, they’re really meant to solve different things. And why would you wanna do this? Why would you wanna change these? A lot of that comes down to scale. There is a really good blog out about using functions to process 100,000 messages a second for Event Hubs. That’s possible because of what Event Hubs gives you from, like, scale characteristics of a session full connection where you can just jam a bunch of events through it.
That would be very hard to do with WebHooks, purely on any platform, because we’re going to spend a lot of time doing HTTP traffic because WebHooks are like a request response. If you’re doing 100,000 requests response WebHooks a second, you’re spending a lot of CPU on that. And Grid will give you all of these and you can change your bindings in place without ever really changing your code, that’s a big promise for bindings and functions that we’re pretty happy with. I already went over the new Grid features. Forgot to move that one.
Stream Analytics, has anyone in this room used stream analytics at all? Okay, a couple people. Stream Analytics is a really cool service that I actually have running in the same Azure subscription on the same data. So, if I go back and look at it Stream Analytics, I have it reading from this Event Hub and this job is very simple. Stream analytics is like a DSL query-like language, I can go see my specific query. I should have changed this because you never want to do a query like this. If you’re doing this in Stream Analytics, you’re doing something wrong. You really just shouldn’t do select *. It’s really made for windowing so you can say, like, “Give me the top in every five minutes,” or “Make an output record whenever the average of this value exceeds some threshold.” So, it’s really meant for more sophisticated stream-based processing rather than just select * type stuff.
You can see here I’ve got a simple query. If it’s an input and an output, input’s in Event Hub, the output’s a file. And what this does is it just receives all of the data and writes it into a file. So, if I go back to my job and turn it on, I can actually just start writing out stuff to a file. Let’s do it, start with now. And if I go look at the actual file that’s output, which is in this storage account here, I’ll send some more data by running my app again. And then, what we’ll be able to do is see stuff being written to the storage account really quickly, which is really the goal of what we want here. So, start this up, come back to browser, now you see why I don’t demo live.
And so, as I refresh this, what I’ll see here is that this file will actually get modified here pretty quickly as soon as those new events go through. And that Stream Analytics job is literally just writing the records as they come in so it’s really not doing something useful that I would necessarily care about, but it’s cool. The more important things, though, that I wanna show you, or more interesting things to me, are how this works with Kafka and our serverless story. So, this is a job, a function, a function that’s reading from an Event Hub. When you create functions, you do like plus new on a function…here, I can just show you this and then cancel out of it.
You can pick what trigger you wanna have. So, I can come down here and find the Event Hub trigger and pick what language I wanna write stuff against Event Hub in, and that’s what I’ve already done here. That was not the click I wanted to make. And we’ll go look at a very simple function that I have in place here. Open. Okay. And for this simple function, I’m going to expand the logs here, I’m going to go run that app one more time, and go see it create some data that we will see here. And now, what you can see is, yeah, I have event processor host, and that’s cool, and I can run counsel applications do these stuff, but I can also just have the cloud sitting here, reading these events as they come through this Event Hub.
And so, let’s see, I’ve got my producer running. And now, you can see all of that same data is just flowing into this function, and it’s literally this code. I can do whatever I want in here, I’m just writing the screen. It’s pretty fast, pretty easy, pretty powerful. And then, you get some other fun stuff with this as well. So, all of these toys are meant to compose together. I got a little bit of time, I think. Is it all right? Yeah. And another service I wanna show you real quick is Time Series Insights. Has anyone played with that at all? One hand, two, okay, three.
So, Time Series Insights a cool service in Azure that’s giving you a view over that log that you can search and visualize so you can see what’s going on inside of here. So, if I actually go here, open up my Explorer for TSI, this thing is just…I’d never really used it much before, but it took me about four minutes to set up, maybe three, on cold medicine, so it should be quicker for other people. And what that’s giving me is you can actually see already that it’s showing me live that I’ve got data that’s gone through this. And I can do things from here. Like, this is not just going to give me visualization, but I can actually go explore these specific events.
And what it’s doing now is it’s enumerating through that stream and showing me what’s on here. I can download these things, I can start to split them. So, if I had different ticker symbols in here, I could actually split them…let me add symbol because you can actually see these are MSFT, which is a new record today. Yes. And you could split them then by symbol in here and start exploring the data within here, which is pretty interesting. And now, here’s where it all gets even more interesting is we look at what we have right now…let me make sure I’m actually on track with what I’m supposed to be talking about.
Not a lot of people use Kafka in this room. It’s actually got a very broad ecosystem, Apache Kafka is a really popular platform in the startup communities. People like it because it’s open source and because it’s very easy to get started with. But the things I would warn people about are, like a lot of open source, it’s “free,” you know. So are puppies if you adopt them from a shelter but they’re not free to keep. So, it’s free like a puppy, not like a beer. You have to feed your Kafka, you have to take care of it. But it is cool and all the cool kids use it so maybe if you like facial hair and hats and glasses, you might be tempted to use Kafka as well.
So, we have actually added support into Event Hubs for it. Clemens talked a little bit about that, its protocol level support. And what this really is that it is running on top of Event Hub. We did not implement Kafka as like a runtime. What we did was we implement a the protocol, which is open, and we’ve been about open protocols for a long time on this team, since before I joined 1,491 days ago. So, what we do is protocol translation and this gets all of the features that you get with Event Hubs automatically with that. And I’ll walk through some of those features right now.
Auto inflate is actually a really cool feature we have for Event Hubs because this solves, probably, the first biggest problem we had for Event Hubs, which was that people would be running with their Event Hubs, everything’s fine, our smallest chunk we sell as 1,000 messages a second. That’ll let you do 86 million events a day, which will cost you $3.14 U.S. So, if you can do eventing for cheaper and faster and more reliable, please feel free to. But what you get from Event Hubs was, you already got your 1,000 messages per second, that’s actually quite a bit, you can run it for a long time and you never really have a problem.
And what would happen is, our biggest support case would be people would call after like three or four or five months in production and be like, “What’s going on? I’m getting errors now that I’ve never gotten before. Something’s wrong,” and it’s that they’ve only purchased this one throughput unit. So, we created a feature that will bump you up one throughput unit at a time, automatically, instead of throwing errors for you. And this is on by default now and you can actually tone it down. So, you can say, “You know what? I don’t wanna pay too much money, I don’t wanna pay $60 a day. I only wanna pay $30 a day so I’ll stop this at 10 megabytes per second,” you know. So, you have that sort of control or you get other things…and this one actually has it on so it’s probably worth going to look at it.
You get other things like Capture. Capture is a feature that we built because the whole world is still batch and this stream thing is a real-time construct, but a lot of the world still runs on batch and that’s just reality, that’s just the way it is. I’m sure my last paycheck will be in a batch, I’m sure my death certificate will be probably with a bunch of other people’s and probably on as AS 400. And so, if we go look at…and that doesn’t matter how long I live. So if we go look at what Capture is, it’s a feature where we give you a time and size window. You say how many minutes and how many megabytes do you wanna close your window on, give us a storage account, and what we’ll do is we’ll write the data for that partition to that storage account as it’s happening.
I have Capture running right now on this already so we can go look at that if we want, actually, since I sent data through here. But what’s cool now is you can actually do Capture with Kafka. So, when you turn on the Kafka feature, you can send through Kafka, and now you can have Kafka writing the stream, someone else is writing Kafka because they know how to do Kafka, and you can get files that contain the data in that stream. I have Kafka on this Event Hub and I wanna show you real quick what that actually looks like.
As you can see, this is Eclipse. For some of you, this may be a long time since you’ve seen this, and you can see there is nothing from Microsoft in here, this is literally just…these are the imports for this, it is literally just Java and Kafka. And when you’re…this is a sender for Kafka, this looks pretty familiar, right? It looks a lot like what I was doing in C# actually. Coincidence? No, I think not. And so, the things you have to do here to make this work are you have to tell us where is the server, this happens to be a DNS, and you have to tell us how you’re going to authenticate. So, these are the properties you need to put in place to make Kafka work.
As of Kafka 1.0, you can do these in a config file rather than through code. So, if you’re running Kafka 1.0 stuff somewhere, you can literally change config file on the producer, or the consumer, and it will work with our Azure Event Hubs for Kafka ecosystems. And so, if I go and run this, let me turn on…you know what? Let’s go look at that function. That function was cool and it’s already listening. So now, if I go back to my friendly function and I see it load back up, and go look at the output for that function, open the logs. And now, I’m going to go start this Kafka producer and this thing is going to run and my function is now going to start writing out once Eclipse finally kicks itself to life. It did, so I believe it’s already running.
And now, these are the things coming out of Kafka. So, this is not being sent with Event Hubs but you saw, in this function, this is bound to Event Hubs so it’s doing protocol translation live for us now that it works for Kafka, and so too does the output of the capture files, which is pretty cool because…oh, that’s why that wasn’t working because that’s the wrong storage container. Awesome, great. Glad I remember that now. So, if I go look at my streaming stuff, I can go see the Capture files that were created for that. And if I go look at the blobs, they will have my Capture and I can see my Event Hub, I can see the name of the hub, I can see the partitions.
And then, you get to pick how you want this naming to work. So, you know what, I wanna go today’s, not yesterday’s. So, oh, God, no, really? Glad Leon’s not here, I would’ve had to say something there. Click back through this, I went to the fourth not the third, any time is fine. And if I go look at these individual files, in each of these will be the Avro file that was created for that time. So, this one’s kind of an empty Avro file. If I do the most recent ones, they’ll have data in there from this. And the whole idea here is that what we’re trying to do is give you all the tools you need to make this work for you.
Event Hubs also has GeoDR, which I’m not going to show today, but that’s a way to do failover it that you control between regions. So, if you wanna fail over between East and West, that works with Kafka in the regions that Kafka is running, works with Event Hubs in any region, so you get recovery time objective built into the service. Real quick, looking at how it is like to migrate from Kafka to Event Hubs for Kafka ecosystems, this is a config file, I mentioned the config file, these were the three things you have to change. So, it’s your bootstrap server, your SASL mechanism to plane, and SASL.JAAS configuration here. And then, the set SASL SSL is the piece…you already saw this.
And the thing I kind of wanna leave you with is when you think about Event Hubs or Event Grid, or any of our messaging stuff, you really have to think about what is the tool you need, you know? Don’t think about the tools, think about the problem. So, okay, someone gave me soup, I need the spoon, you know. Someone’s giving you a stream of data, you need Event Hubs. If someone’s trying to give you a way to subscribe to events or to be reactive to things or your building a serverless thing, you probably need Event Grid to kick that off, at least, or to do some sort of orchestration in that or start your orchestration.
This is a gross oversimplification but an easy way to think of this is Event Hubs is for fanning your data in, lots of producers, a few readers, maybe 5, 10, 20 readers of the whole stream max, but you could have thousands or tens of thousands or hundreds of thousands of producers. Grid is the opposite. Your data, like your storage account, you’re not going to have 100,000 storage accounts, I don’t think, I’d be really surprised, but you might have 100,000 listeners for one storage account. So, that is the design, so our two separate things. Think about that, remember there is no spoon. There you go, yeah, there is no spoon. And that’s your Zen moment for the day, I hope, and I will be around the rest of the week for any questions you have.
We have like two minutes for Q&A if anyone wants any now. This is Seamus, he goes in all my presentations. So, that’s him at the beach. Thank you very much and please go and try these toys if you haven’t. Event Hubs is actually really popular for integration, you’d be surprised, especially with the Capture feature. It works really well for replacing these sort of SSIS type things that people are doing, especially if they’re doing them with modern applications that are going to be using like log for J or log for net. And you can pipe those appenders right to Event Hubs or to Kafka. Thank you.
Fill the form below to get all the presentations delivered as a single zip file in your mailbox.
byJon Fancey & Matt Farmer
byMicrosoft Integration Team