The metric approach as an element of artificial intelligence for scheduling
problems
On one of the fundamental problem of NP-hard in the strong sense scheduling theory 1|r_j|L_{max} it is shown that all solvable cases are expressed using only two matrices (the identity matrix and the Jordan matrix).
In fact, our goal is aimed at obtaining a mathematical justification for artificial intelligence methods. It is necessary to use previously acquired knowledge, skills about the problem, algorithms for solvable subcases of the problem most effectively. In addition, it is necessary to estimate the error of the obtained approximate solution to the problem. The metric approach allows us to estimate the absolute error of the optimal value of objective function.
Schematically, this can be represented as follows. We have a current situation of the problem (point A in a multidimensional space). For example, for single machine scheduling theory problems, this is a point in the 3n-dimensional space, where n — the number of jobs. We know the polynomially solvable subcases of the problem. They are always bounded by some system of linear constraints. We find the projection of initial point A in our metric onto the solvable subcaseby solving a linear programming problem. As result, we obtain point B, at which we can find an approximate solution in polynomial time with a minimal upper bound on the absolute error of the objective function value at points A and B.
FLOPs Are Abundant, But Bandwidth Is Not: Rethinking Data Movement in 100K GPU Clusters
Maxim Shevtsov is Performance Optimization Expert originally specializing in Graphics, Computer Vision and heterogeneous computing. His current focus is high-throughput LLM inference and enterprise AI workloads, maximizing hardware utilization across NPUs, for performance-critical AI deployments. He is senior Expert in Huawei, responsible for the whole Inference-server direction of the entire Russia Research Center.
Large Language Models Inference at Scale: Scheduling Challenges
Today the LLMs grows very rapidly with size, often using tens and hundreds of GPUs, forcing careful model sharding, with associated distributed challenges like communication bottlenecks.
Dynamic input/outputs lengths, optimizations like MoE's conditional execution create severe device underutilization, while thousands of small operations (e.g., attention heads) drown may in kernel scheduling overhead. At the same time the hardware vendors offer a scale-up composition with up of tens of servers, delivering hundreds of PFLOPs of compute, totaling several terabytes of on-chip memory, and terabytes per second of memory bandwidth.
Today these two trends overlap, thus new challenges emerge, requiring fine-grained routing, synchronization, and load balancing across hundreds of devices.
This talk describes new challenges that production meets, and associated shifts in engineering paradigms.
Topological methods for traffic analysis in computer networks
One promising area of network traffic analysis is the use of artificial neural networks based on the Kolmogorov-Arnold theorem in combination with wavelet analysis and topological data analysis. This approach is driven by the fact that modern neural network architectures, such as Kolmogorov-Arnold Networks (KANs), demonstrate significant potential for modeling complex nonlinear dependencies while maintaining high interpretability compared to traditional MLP networks. This opens new horizons in the construction of interpretable and effective models for network traffic analysis.
Reinforcement Learning Methods for Fair Traffic Allocation and Efficient Resource Utilization in Communication and Computing Networks: A Survey of Approaches and Perspectives
This talk provides a survey of modern reinforcement learning (RL) methods applied to load balancing optimization in telecommunication and computing networks. We examine the major families of RL algorithms - value-based, policy gradient, and actor-critic methods - with respect to their applicability to fair traffic allocation and efficient utilization of network resources. The advantages and limitations of each approach are analyzed, along with a review of already deployed solutions and promising research directions such as hierarchical architectures, multi-agent reinforcement learning (MARL), meta-RL, and hybrid RL/classical optimization schemes. The talk aims to systematize existing experience and identify open research questions.