Amazon researchers developed a new architecture that reduces a foundation model's inference time by 30% while maintaining its accuracy. Like specialized regions in the brain, this new system selects appropriate subsets of neurons depending on the task: https://amzn.to/3IKPA6p
Really great work from the team! As AI applications grow, more cost will move from training to inference and ideas like these to dynamically allocate compute based on the complexity of the task could play a large role in making AI applications sustainable in terms of cost.
Max Ritter Jonathan Roesner