Overview of Kafka Summit 2023
Welcome to the world of event-driven architecture! In today's rapidly evolving technological landscape, event streaming has emerged as a pivotal element in building robust and scalable systems. With its ability to process and analyze data in real-time, event-driven architecture is revolutionizing industries across the globe.
Kafka Summit, the premier event for the Apache Kafka® community, is your gateway to discovering the latest advancements and insights in event-driven data streaming. As data engineers, developers, architects, and DevOps professionals gather, they bring with them a wealth of knowledge and experience in harnessing the power of Apache Kafka.
In our blog, we will dive deep into the key takeaways from Kafka Summit London 2023, focusing specifically on event-driven architecture. Join us as we explore how companies of all sizes are rethinking their software architectures to leverage real-time context. We will uncover the transformative impact of event-driven systems, as monoliths give way to microservices, on-prem deployments migrate to the cloud, and batch processing makes room for stream processing
Business Analysis of the Evolution of Data Streaming and the Event Streaming Platform Stack
Important Points:
Introduction
Welcome to Kafka Summit London 2023, highlighting the power of community.
Growth in data streaming driven by trends like IoT, cloud computing, AI, and customer experience demands.
Shifting Software Architecture
Software architecture transitioning from single-celled to multicellular organisms.
Increasing interconnection between different parts of software within organizations.
Data streaming as the connective tissue that plugs applications and data systems together.
The Role of Data Streaming Platform
Data streaming platform emerging as one of the most important data platforms in companies.
Kafka as the foundation of the emerging stack for event streaming platforms.
Data streaming platform connects applications, processes streams, and governs data at scale.
The Evolving Stack
The stack includes Kafka, stream processing frameworks, connectors, and data governance tools.
Each layer of the stack contributes to a comprehensive platform for event streaming.
Exciting innovation happening in each layer, with new products, features, and open-source contributions.
The Focus on Kafka
Kafka has become the foundation of the streaming world.
Ongoing advancements and exciting changes in the Kafka project.
Recent internal re-architecture project: removal of ZooKeeper for Kafka's own metadata management.
Future Outlook
Simplifying Kafka management and improving scalability.
Planned enhancements and features on the horizon.
Exploration of future ideas for Kafka's development.
Conclusion
The event streaming platform stack continues to evolve and empower businesses.
Exciting opportunities for innovation, networking, and learning in the Kafka community.
Benefits of Cloud-native Event Streaming Platform (from a Business Perspective)
Community Collaboration: The success of Apache Kafka and the event streaming platform is attributed to the collaboration and contributions from a global community of enthusiastic users, attending meetups, participating in forums, and helping each other.
Evolution and Exciting Changes: Kafka has evolved from a low-level commit log to a rich data streaming platform. It has experienced incredible adoption across various industries and use cases, driven by trends like real-time data, IoT, cloud, AI, and machine learning.
Connective Tissue for Applications: Data streaming acts as the connective tissue that links different applications, data systems, and processes within an organization. It enables companies to act in real-time and react to events and changes happening across the organization.
Key Developments in Kafka: Recent advancements include removing ZooKeeper dependency, allowing Kafka to manage its own metadata, making it simpler and more scalable. Work on storage integration with S3 has gained significant adoption and will be part of the mainstream Kafka offering.
Future Roadmap: Planned enhancements include improving client ecosystem and protocol, organizing topics for better management and permissions, and simplifying partitioning in Kafka. These initiatives aim to simplify the ecosystem, ensure compatibility, and support efficient scaling.
Cloud-native Approach: The introduction of a cloud service like Confluent Cloud (Cora) offers a cloud-native event streaming platform experience. Cora was designed to be multi-tenant, operate at scale, and provide elasticity, resiliency, and efficient resource management.
Benefits for Customers: Customers using the cloud-native event streaming platform experience improved scalability, elasticity, and resiliency. They can leverage automation and optimized storage tiering for enhanced performance and reliability.
Higher SLA and Operational Efficiency: Confluent Cloud provides a higher SLA compared to self-managed Kafka clusters, offering better availability and reliability. The cloud-native approach significantly improves resource efficiency and reduces operational complexities.
Continuous Innovation: The open-source nature of Kafka, along with commercial support and contributions from various companies, drives continuous innovation in the event streaming ecosystem. The community and broad commercial support play a crucial role in shaping the future of Kafka.
Advancements in Event Streaming Platform (from a Business Perspective)
Cost Savings and Competitive Pricing: Confluent Cloud (Cora) offers significant operational advantages and efficiencies, enabling cost savings for customers. The cloud-native approach and underlying system efficiency allow Confluent to provide competitive pricing. The Cost Savings Challenge helps customers estimate and optimize their costs of running Kafka in the cloud.
Performance and Latency Improvements: Cora demonstrates superior performance, even compared to open-source Kafka clusters. The cloud service's internal engine ensures low and predictable latency, with ongoing work to further reduce latency through advancements in the replication engine.
Cloud-Native vs. Open Source: While the open-source Kafka provides flexibility and allows running on-premises or in self-managed environments, cloud services like Confluent Cloud offer distinct benefits, including scalability, elasticity, resiliency, and efficient resource management. Both have their place, catering to different needs.
Connectors: Connectors play a vital role in enabling data flow between systems in the event streaming platform. The ecosystem of open-source connectors has grown significantly, and Confluent offers over 70 fully managed connectors in its Cloud offering. Custom connectors are now available, allowing users to connect to any system and benefit from the same operational advantages.
Governance: Governance becomes crucial as organizations scale their event streaming platforms. Confluent Cloud provides features for stream governance, including schema management with a schema registry, metadata management, lineage tracing, and data quality rules. The new functionality extends data quality rules to support more nuanced data validation and actions.
Stream Processing: Stream processing is an essential layer in the event streaming platform, enabling building applications around streams of data. Various technologies exist, ranging from simple libraries like Kafka Streams to more comprehensive frameworks like Apache Flink. Flink has emerged as a de facto standard in the space, providing powerful capabilities for stream processing and application lifecycle management.
Continuous Innovation: The event streaming platform ecosystem continues to evolve with ongoing innovations in open source and cloud services. Companies like Confluent drive advancements in areas such as cost optimization, performance, connectors, governance, and stream processing to meet the diverse needs of users.
Advancements in Event Streaming Platform (from a Business Perspective) - Part 2
Evolution of Data in Motion: Similar to the evolution of data at rest, where databases provided higher-level abstractions and simplified application development, there is a need for technology that eases building scalable applications around streaming data. Apache Flink has emerged as a powerful stream processing platform, analogous to how databases transformed data at rest.
Flink's Growth and Adoption: Flink has witnessed significant adoption, following a similar trajectory as Kafka, particularly among technically sophisticated companies. Flink's versatility, offering declarative SQL, programmatic APIs, batch and streaming capabilities, and stateful functions across various programming languages, positions it as a comprehensive stream processing solution.
Cloud-Native Flink Offering: Confluent recognizes the importance of Flink in the streaming ecosystem and has developed a cloud-native offering around it. The initial release includes core capabilities such as auto-scaling, load balancing, fault tolerance, and a fully managed experience. The goal is to provide an integrated data streaming platform that seamlessly combines Kafka, Flink, connectors, governance, and more.
Accessibility and Ease of Use: Confluent aims to simplify the streaming platform ecosystem, enabling users to focus on building applications rather than integrating components. The integration between Kafka and Flink is seamless, with Kafka topics automatically appearing in Flink SQL's metadata store. The cloud-native Flink offering is designed to be accessible across multiple clouds, with AWS being the initial release, followed by other cloud providers.
Early Access and Roadmap: Confluent is launching an Early Access program for the SQL API of the Flink offering. It will be generally available by the end of the year. Programmatic access for other Flink APIs is planned for 2024.
Acquisition of ImroK and Community Collaboration: Confluent acquired ImroK, a startup with top Flink contributors, to strengthen its Flink expertise and leverage their community involvement. The collaboration between the teams aims to enhance the Flink offering and contribute to the evolution of the Flink community.
Advancements in Event Streaming Platform (from a Business Perspective) - Part 3
Cloud Native Companies and Digital Ecosystems: Cloud native companies, such as those in the New Media industry (video streaming, online gaming, social media), have leveraged cloud infrastructure to create new digital ecosystems and reach broader audiences. Event streaming platforms play a crucial role in powering these digital transformations.
Confluent Cloud Environments: The demo showcases three main environments: integration, quality of service, and engagement. Each environment has its own Kafka cluster(s) to store raw events and business data. Compute pools provision resources for Flink SQL queries, offering auto-scaling and workload management.
Integrated Metadata Model: Flink SQL provides a metadata model that integrates with Confluent Cloud. Catalogs map to environments, databases map to Kafka clusters, and tables map to Kafka topics. The integrated metadata simplifies data exploration and analysis.
Data Exploration: Flink SQL enables data exploration through queries. Descriptive analysis, such as counting unique viewers and measuring popularity, can be performed. Flink's auto-scaling capabilities and seamless integration with Kafka topics allow for real-time analysis and monitoring.
Complex Event Processing: Flink SQL supports complex event processing, where new events are derived based on patterns in the data. The example demonstrates detecting videos that require rebuffering and triggering automated responses.
Time-Based Aggregations: Flink SQL's statement sets enable multiple time-based aggregations to be performed on a single data stream. This eliminates the need for multiple queries and simplifies data processing and analysis.
Key Features: The demo highlights the always-on compute pools, integrated metadata, separate storage and compute scaling, and advanced language features in Flink SQL. The platform offers scalability, resource optimization, metadata management, security, governance, and monitoring.
Early Access and Community: Confluent Cloud is offering Early Access for Flink SQL, inviting users to try out the capabilities. Confluent encourages businesses to join the Flink Community and leverage the power of event streaming platforms to drive their own business applications.
Advancements in Event Streaming Platform (from a Business Perspective) - Part 4
Extending Data Sharing: Customers' demand for sharing data with partner organizations, vendors, and suppliers has led to the development of stream sharing functionality. Stream sharing allows secure data sharing between organizations, enabling data to flow beyond internal boundaries and fostering collaboration across ecosystems.
Data Streaming as the Fourth Estate: Data streaming is emerging as a vital component of data architecture, connecting various systems and applications within an organization. It enables the flow of data across departments and facilitates real-time insights and actions, transforming organizations into fully integrated entities.
Michelin's Data-Driven Approach: Michelin, known for its tire manufacturing, is harnessing data to become a sustainable mobility provider. They utilize event streaming platforms to process vast amounts of data, enabling predictive analytics, route planning, and enhancing customer experiences. Kafka has played a crucial role in transforming Michelin's business operations.
Migration to Confluent Cloud: Michelin initially deployed Kafka on-premises but encountered scalability issues as Kafka usage grew. To overcome the challenges of self-operating Kafka clusters, they migrated to Confluent Cloud, leveraging its managed services and scaling capabilities. This migration facilitated smoother operations and enabled Michelin to focus on becoming a data-driven company.
The Shift to Data in Motion: Michelin's goal is to put data in motion, embracing event-driven architecture and leveraging Kafka to unite the transactional and AI-driven worlds. By unlocking data and enabling its flow, Michelin aims to enhance agility, react to the unknown, and fuel innovation across the organization.
Flutter's Real-Time Streaming Needs: Flutter, a global enterprise operating gaming and betting platforms, relies on real-time data streaming to provide immediate feedback to customers. Kafka powers their event streaming infrastructure, allowing them to process data in real-time, enrich streams with external sources, and update pricing models, bets, and risk liabilities. They operate multiple Kafka clusters across different environments and utilize self-managed Kafka and managed Kafka services like Amazon MSK.
Regulatory Compliance in the Gambling Industry: Flutter operates in a heavily regulated gambling industry and ensures compliance with safeguards and regulations. Kafka plays a critical role in fraud detection, anti-money laundering, affordability checks, and customer safety measures. Kafka's capabilities support responsible gambling practices and help prevent criminal activities on the platform.
Advancements in Event Streaming Platform (from a Business Perspective) - Part 5
Real-time Use Cases: In the gambling industry, real-time data is crucial for bet placements, cashouts, and spins. Flutter utilizes Kafka to facilitate immediate feedback for customers across their platforms. Kafka's success lies in its ability to seamlessly integrate data across various applications without additional effort.
Enriching Data in Motion: Flutter enriches their data streams, such as bet processing and placement, with real-time external data from sporting events like football matches or horse racing. This enriched data updates pricing models, bets, risk, liabilities, and enables actions like pausing cashouts during in-play transactions. Kafka serves as the underlying engine, working in conjunction with other tools to process these real-time use cases.
Kafka Clusters and Environments: Flutter operates around 30 Kafka clusters spread across three environments: production, disaster recovery, and testing. They employ a hybrid cloud model, leveraging self-managed Kafka clusters and managed Kafka services like Amazon MSK. However, migrating from on-premises to the cloud presents challenges due to the complexity involved.
Regulatory Compliance: Flutter operates in a regulated gambling industry and focuses on customer safety. They implement safeguards like deposit limits, self-exclusion, affordability checks, fraud detection, and anti-money laundering measures. Kafka plays a significant role in ensuring compliance and protecting customers from criminal activities.
Leveraging Managed Services and Cloud: Flutter is gradually moving towards a public cloud and container-based hosting strategy. They seek managed services like Confluent Cloud to scale their clusters up and down easily. By offloading operational overhead, Flutter's small team can focus on providing a seamless experience for developers.
Utilizing Confluent for Streamlined Operations: Confluent's platform offers native solutions for crucial projects like improved geo-replication, disaster recovery, and self-balancing partitions. Flutter values services like Kafka Connect and kSQLDB but seeks to provide managed patterns to reduce developers' operational burden. Confluent for Kubernetes holds promise for simplifying and streamlining operations further.
Start Small on the Streaming Journey: Flutter advises organizations new to Kafka and event streaming to start with small use cases. Building a strong foundation with a well-suited use case helps subsequent event streaming applications become easier to implement. The journey toward data streaming requires patience, but the benefits of data in motion make it worthwhile.