Briefly
- IplanRIO launched Rio 3.5 Open 397B on June 13, billing it as a government-built frontier AI mannequin with benchmark scores topping Qwen 3.7 Plus.
- AI firm Nex revealed a mathematical proof exhibiting the mannequin is a direct 0.6 Nex / 0.4 Qwen weight merge.
- IplanRIO up to date the mannequin card, credited Nex, pulled the benchmark claims, and blamed an “incorrect add.”
Rio de Janeiro’s IplanRIO launched Rio 3.5 on June 13. Town’s IT company referred to as it a frontier-class mannequin: 397 billion parameters, with a permissive open-source license, constructed by the municipal authorities of a metropolis within the International South.
Rio 3.5’s launch timing was excellent: Brazil was enjoying its World Cup opener, and social media was already on hearth. Feedback about it quickly unfold from Brazil to past.
However simply as shortly because it gained consideration, there was a dispute over who precisely created the mannequin.
The unique mannequin card described Rio 3.5 as a post-train of Qwen 3.5 397B, Alibaba’s open-base mannequin, with a brand new reasoning layer referred to as SwiReasoning added on prime. The event value was reported at R$500,000 (Rio didn’t affirm this), or practically $100,000 USD—roughly 30 instances cheaper than equal off-the-shelf AI techniques.
The structure is Combination-of-Specialists, which suggests solely round 17 billion of the 397 billion parameters hearth on any given token. That makes inference cheaper than the headline dimension suggests. The mannequin additionally helps imaginative and prescient and textual content, handles over a dozen languages, and ships underneath a completely open MIT license.
SwiReasoning is the technical centerpiece. It is a training-free inference framework that switches dynamically between two modes. When the mannequin is assured a couple of subsequent phrase—low entropy within the likelihood distribution—it causes in plain language. When unsure, it shifts to latent reasoning, considering in hidden inner states with out emitting tokens. IplanRIO stated Rio 3.5 was particularly educated to use this, and that the features present up within the benchmark numbers.

The self-reported numbers had been eye-catching. Terminal-Bench 2.1—which measures autonomous terminal command execution, scored as share of duties handed—got here in at 70.8% for Rio 3.5, edging out Qwen 3.7 Plus at 70.3% and the highly effective DeepSeek v4 Professional at 67.9%.
On IMOAnswerBench, a math olympiad benchmark scored as share right, Rio 3.5 hit 89.5%. On HLE—Humanity’s Final Examination, a near-unsolvable multi-domain professional battery scored as a share—Rio 3.5 landed at 36.5%, forward of Qwen 3.7 Plus’s 34.7%.
A municipal authorities beating an important flagship fashions on probably the most significant high quality benchmarks: That is the headline that unfold, particularly after the Mayor of Rio de Janeiro tweeted about it.
“An open AI mannequin educated in Rio and publicly funded over the past yr by [the Municipality of Rio] has simply surpassed all different fashions,” Eduardo Cavaliere wrote. “At present, the world is speaking about an open AI mannequin educated in Rio.”
🇧🇷 Modelo de IA aberta treinada no Rio com financiamento público ao longo do último ano pela @Prefeitura_Rio superando todos os outros modelos. Inteligência synthetic não é uma coisa distante, estrangeira, de laboratório bilionário…não existe só pra fazer texto, imagens… https://t.co/GK1ThytVV9
— Eduardo Cavaliere (@CavaliereRio) June 14, 2026
Then Nex confirmed up
“Educated in Rio” proved to be not solely correct.
Nex-AGI, a Shanghai-based open-source AI alliance, posted on X days after the discharge. The opener: “The Rio 3.5 mannequin broke the web this week. The plot twist? It is primarily our open-source mannequin, Nex N2 Professional, sporting a distinct hat.”
They’d analyzed the weights. The math was precise: Rio 3.5 ≈ 0.6 × Nex N2 Professional + 0.4 × Qwen 3.5. A verification script and a full GitHub report adopted.
The Rio 3.5 mannequin broke the web this week. The plot twist? It’s primarily our open-source mannequin, Nex N2 Professional, sporting a distinct hat.
🤯 We analyzed the weights, and the recipe is precise: Rio 3.5 ≈ 0.6 * Nex N2 Professional + 0.4 * Qwen 3.5
It even actually introduces itself… pic.twitter.com/yHRRu37aut
— Nex (@NexEcosystem) June 14, 2026
The proof got here in two elements.
First, behavioral. Nex stripped the hardcoded “You’re Rio” system immediate from the deployed mannequin and despatched it 120 id questions. With out the masks, Nex studies the mannequin referred to as itself “Nex, from Nex-AGI” 79.2% of the time. It referred to as itself “Rio” precisely 0% of the time. Nex stated the mannequin additionally recited the corporate’s particular backstory verbatim, mentioning the “Shanghai Innovation Institute” and “a large-model ecosystem alliance.” That is Nex’s personal coaching information, surfacing in another person’s mannequin.
Second, mathematical. In a real weight merge, each parameter within the new mannequin sits on a straight line between the 2 supply fashions. Nex measured this collinearity throughout all 60 layers. The end result got here again at 0.993. Two unrelated fashions in the identical parameter area scored near-zero by likelihood. Hitting 0.993 throughout each single layer is not a coincidence. The blending ratio held at α ≈ 0.571, secure to a few decimal locations.
Principally, it was practically 60% Nex, with the remainder being the bottom Qwen mannequin.
“Each weight tensor in Rio is, to hundreds of ordinary deviations, the identical 0.6/0.4 mix of Nex and Qwen—throughout all 60 layers and each part of the community,” Nex wrote. “There is no such thing as a harmless clarification.”

The numbers additionally advised a quieter story. Nex N2 Professional, launched simply days earlier than Rio 3.5, scores 75.3% on Terminal-Bench 2.1—increased than Rio’s 70.8%. On GDPval, an financial forecasting benchmark scored as an Elo-style ranking, Nex sits at 1,585 in opposition to Rio’s 1,533. If Rio is 60% Nex, then you definately’d count on it to attain under Nex on Nex’s personal benchmarks. It does.

IplanRIO responds
IplanRIO up to date the Hugging Face mannequin card—the benchmark desk got here down and the attribution modified.
“The mannequin is constructed through a merge of nex-agi/Nex-N2-Professional and Qwen/Qwen3.5-397B-A17B, preceded by On-Coverage Distillation from a stronger mannequin,” the up to date Readme says. “We detected an incorrect add within the earlier model, the place the bottom merged model was uploaded as a substitute of the ultimate distilled mannequin. We’re sorry for the confusion and apologize profusely.”
No different public assertion from IplanRIO has come out. Nex is now credited.
The “incorrect add” clarification is the important thing declare. IplanRIO says the meant launch was a distilled model of the merged base—not the uncooked merge itself. On-policy distillation means a stronger trainer mannequin generates outputs, and the scholar trains on these whereas additionally producing its personal. It is costlier than a uncooked merge, however nonetheless cheaper than coaching from scratch. If that step was actual, then it will symbolize a minimum of some unique work on prime of the merge.
What really shipped, per IplanRIO, was the merged base with nothing on prime.
Group observers cut up on what meaning. Tech commentator Rafael Quintanilha gave the charitable learn: Since Nex N2 Professional is itself constructed on Qwen, the group might have credited the underlying structure and left it there. He additionally identified the mannequin went viral throughout a World Cup match, “not essentially ‘prepared for public consumption.'”
concerning the Rio 3.5 scenario
merging two ~400B-class fashions after which making use of coverage distillation isn’t trivial
that stated, they made two errors:
– a technical error (in all probability attributable to an absence of consideration to element)
– and a communication one (we will debate the integrity of…
— montano (@lucas_montano) June 15, 2026
Developer and AI YouTuber Lucas Montano famous that “merging two ~400B-class fashions after which making use of coverage distillation is not trivial”—whereas acknowledging each a technical error and a communication failure.
AI researcher Diego Ambrosio was much less beneficiant. The unique launch described Rio 3.5 as the results of “autonomous post-training and proprietary fine-tuning”—framing that implied unique analysis, not a merge.
Authorized? Sure. Moral? Nicely…
Mannequin merging is totally authorized. Nex N2 Professional is Apache 2.0—you should use it, modify it, and redistribute it, so long as you credit score it. Qwen 3.5 is brazenly licensed too. No person’s going to court docket. right here.
The issue was presenting the output as independently developed work with out naming all the supply fashions. The open-source neighborhood has seen this earlier than. Earlier this yr, Cursor’s Composer 2 was discovered to be constructed on Moonshot’s Kimi K2.5 with out disclosure. The backlash was quick and reputational—no legal professionals, simply screenshots.
Constructing on present open fashions is regular. As Decrypt has coated, stacking and merging open weights is virtually its personal subculture. The norm is not “do not construct on others’ work.” The norm is: Say what you used.
What made this louder than a typical attribution miss was the institutional wrapper. A pseudonymous developer delivery a frankenmerge underneath their very own identify is one factor. A municipal authorities utilizing it to assert public-sector AI sovereignty—in the course of the World Cup—is one other. “It was a waste of sources,” one Brazilian commentator wrote.
Nex did not make it a struggle. “We’re flattered that the Metropolis of Rio used our work to realize SOTA efficiency,” the corporate wrote on X. “However within the open-source world, attribution issues.”
IplanRIO is working to add the corrected, distilled mannequin with full attribution in place. When that lands, the identical checks will run once more—and the neighborhood will discover out whether or not the distillation really modified something, or whether or not it is nonetheless principally Nex with a distinct system immediate.
Each day Debrief Publication
Begin on daily basis with the highest information tales proper now, plus unique options, a podcast, movies and extra.
