Operating Systems: Fueling AI Innovation

Stocks News 109 Comments

Operating Systems: Fueling AI Innovation

Advertisements

The evolution of operating systems has long been viewed as the crown jewel of the software industryFrom the embryonic stage in the 1940s and 1950s, when assembly languages allowed developers to manage hardware, to the 1960s when high-level programming languages were introduced, the journey of operating systems reflects a continuous evolution in software developmentFast forward to over a decade ago, when rising complexity in software and vast scale prompted a shift in focus towards managing clusters and an array of "microservices," resulting in cloud computing becoming a new breed of operating system aimed at simplifying the complexities of cluster management.

As we stand on the brink of the AI-native era—characterized by the explosive growth of large models and applications built around artificial intelligence—we find ourselves amidst a technological revolution in operating systems

Advertisements

They are emerging as the wellspring of innovation for this new ageThis transformation was exemplified recently at the Create 2024 Baidu AI Developer Conference, where Baidu Smart Cloud unveiled its next-generation intelligent computing operating system, WanYuanThis platform is designed to abstract and encapsulate the complexities of AI-native computing environments, effectively streamlining human-computer interactions, and ultimately offering a simpler and smoother development experience for programmers.

The unveiling of WanYuan heralds the start of an era where anyone can become a developer.

The AI-native era is witnessing a significant evolution of operating systemsYears ago, Linus Torvalds, the founder of the Linux operating system, famously remarked, "Talk is cheap, show me the code." In earlier development phases, concrete lines of code held more sway than words, as the programming community sought to emphasize the supremacy of code

Advertisements

Today, however, the landscape has shifted dramatically; coding through natural language is no longer a distant dream, paving the way for a reality where everyone can participate in development.

According to Shen Dou, Baidu Group's Executive Vice President and President of Baidu Smart Cloud, the advent of large models has fundamentally transformed the relationship between humans and machines, driving a paradigm shift in software development"While traditional cloud computing systems remain important, they are no longer the central focusWith the explosion of large models and AI-native applications, we urgently require a new operating system," he asserted.

Shen Dou, Executive Vice President of Baidu Group and President of Baidu Smart Cloud

A closer examination of current trends confirms this assessment

Advertisements

The emergence of large models signifies a profound democratization of technology that has far-reaching implicationsIn development contexts, what once belonged to a select few trained professionals is now becoming accessible through natural language, representing a significant advancement in technologyAs large models permeate various domains, operational systems themselves are poised to undergo a similar transformation.

From the perspectives of technological evolution and market demand, it's logical that the next generation of intelligent computing operating systems will be AI-centric.

First, as infrastructure hardware evolves, the explosive growth of AI applications necessitates heterogeneous computing environments as the normTypically, a cluster comprises a vast array of different chips, and training large models often requires thousands of nodes

This requires operating systems with the capability to manage, schedule, and optimize an efficient release of computational resources.

Secondly, the core of operating systems must inevitably grow more complex, with large models emerging as fundamental components of these coresThey become the driving engines within operating system architecturesBeyond just accommodating various large models, the systems must also facilitate robust service capabilities such as model invocation, assessment, and deployment tools.

Additionally, operating systems must provide quality development tools for workflow orchestration and plugin management, rethinking human-computer interactions to furnish developers with straightforward and fluent experiences.

According to Shen Dou, the new intelligent computing operating system not only shifts management from heterogeneous computing hardware but also incorporates the knowledge from large model compression

alefox

Its management focus transitions from processes and microservices to managing intelligence, fundamentally transforming software development paradigms in the processThe approach has evolved from procedural and object-oriented programming to demand-oriented paradigms, where programming languages evolve towards natural language.

Ultimately, systems like Baidu Smart Cloud's WanYuan hold the potential to lower barriers for AI application development, enabling broad access and accelerating the democratization of AI technologies.

The complexity of building an operating system has always been substantial, with high market entry thresholdsWith the advent of the AI-native era, the newer intelligent computing operating systems represent a confluence of hardware capabilities, AI capabilities, and cloud computing abilitiesBaidu Smart Cloud’s years of deep accumulation in AI, cloud computing, and developer technology positions WanYuan as the benchmark for the new generation of intelligent computing operating systems.

The architecture of WanYuan, functional within the AI-native landscape, comprises three layers: the Kernel (core), Shell (interface layer), and ToolKit (tools layer).

In the kernel layer, WanYuan abstracts the complexity of heterogeneous computation

Buoyed by its BaiChe AI Heterogeneous Computing Platform, it achieves an effective training duration exceeding 98.8% on large clusters, with bandwidth utilization reaching 95%. It boasts the highest computing efficiency in the industry and accommodates various domestic and international heterogeneous chips like Kunlun, Ascend, Haiguang DCU, NVIDIA, and Intel, facilitating efficient compatibility with minimal costs.

"WanYuan emerged from Baidu's years of expertise in AI and cloud computing, and it was developed with the AI-native era in mind," explained Hou Zhenyu, Vice President of Baidu GroupThe BaiChe platform has already been validated in complex scenarios for large model training, inference, and application.

One of the paramount challenges facing heterogeneous computing today is managing resources across multiple chips and architectures under the "one cloud, multiple chips" paradigm

This demands exceptionally high technical capacity, engineering expertise, and ecosystem development capabilities to integrate diverse chips and software effectivelyFurthermore, given the current state of chip supply chains in the country, the need to employ diversified chips to manage complex training tasks has become essential.

Handling various chips during single training tasks presents a formidable technical hurdle, primarily due to the need to address how to equitably allocate computational power among different vendor chips and optimize communication efficienciesBaiChe has already achieved mixed training tasks involving various vendor chips with performance losses of less than 3% at scales of ten-thousand nodes and no more than 5% at thousand-node scales—setting an industry precedent in terms of minimizing hardware disparities and liberating users from dependence on a single chip.

The substantial advancements made by WanYuan in the arena of "one cloud, multiple chips" can be attributed to many innovative techniques

One component being the AIAK acceleration library at the core, which enhances network communication speed, allowing diverse chips to operate in unison, achieving linear acceleration with a 95% gainBaiChe employs various parallel strategies, including Tensor Parallelism, pipeline parallelism, and model parallelism while utilizing proprietary adaptive algorithms to automatically set parallel strategy parameters, ensuring different chips operate seamlessly on a unified computational network.

In addition to addressing "one cloud, multiple chips," WanYuan's kernel layer integrates the Wenxin large model and third-party models that encapsulate world knowledge and the capabilities of understanding, generating, reasoning, and remembering natural languageThis provides simple interfaces upwards to ensure efficient operational performance for AI-native applications, meeting diverse user needs across varying business contexts.

Sitting above the kernel layer is the Shell, which simplifies model development complexities via its QianFan ModelBuilder, addressing model management, scheduling, and secondary development issues

The QianFan ModelBuilder turns model development tools into packaged products that allow enterprises and developers to perform rapid fine-tuning and adjustments on foundational large modelsFurthermore, it offers model routing services to optimize and maximize model capabilities based on different tasks.

Rounding out the architecture, the tools layer comprises QianFan AppBuilder and AgentBuilder platforms, designed to eliminate complexities within application development to provide developers with AI-native application capabilities, thus enhancing efficiency and user experienceFor instance, the QianFan AppBuilder enables developers to create AI-native applications using natural language without needing to write code, and it seamlessly integrates via APIs or SDKs for quick deployment.

At this juncture, it is crucial to establish a thriving ecosystem, where innovation can flourish

The success of an operating system heavily relies on its ecosystem, and the new generation of intelligent computing operating systems is certainly no exception.

In the AI-native era, the introduction of a new operating system as a source for innovation will only be meaningful if sustained support through a healthy ecosystem is establishedSo, how can this ecosystem for the new generation of intelligent computing operating systems thrive?

Baidu Smart Cloud's approach is to leverage applications as the driving force, partnering with leading industry players to explore real-world applications of large models alongside their implementation in actual use casesThe numbers are compelling: in just six months, partnerships for Baidu's QianFan model platform have surged over 500%, indicating significant progress in building an ecosystem.

Moreover, the launch of WanYuan is merely the first step; Baidu Smart Cloud aims to foster further openness in the ecosystem, implementing various strategies to propel the eco-system to full health.

Specifically, Baidu Smart Cloud plans to expand capabilities and interfaces, simplifying application development for developers; create industry-focused vertical operating systems based on WanYuan; deploy WanYuan within corporate intelligent computing centers to provide stable, secure, and efficient environments; and adapt to more chips to further conceal the complexities of heterogeneous clusters, maximizing their computational efficacy.

In summary, the introduction of Baidu Smart Cloud's WanYuan operating system indeed sends a clear signal that operating systems in the AI-native era will rapidly evolve

Operating Systems: Fueling AI Innovation

Leave A Comment