Technology

1

27

Logitech breaks your mouse with a software update (www.macrumors.com)

submitted 6 hours ago by ComradePedro@lemmy.ml to c/technology@lemmy.ml

11 comments fedilink

As if constantly pushing more AI slop into their software while making no real improvements wasn’t enough…

2

25

Ubisoft Closes Halifax Studio Just After Employees Unionized [Updated] (insider-gaming.com)

submitted 8 hours ago by chobeat@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

3

7

LLM problems observed in humans (embd.cc)

submitted 11 hours ago by yogthos@lemmy.ml to c/technology@lemmy.ml

2 comments fedilink

4

Distinct AI Models Seem To Converge On How They Encode Reality (www.quantamagazine.org)

submitted 11 hours ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

5

21

DeepSeek paper on their new Manifold-Constrained Hyper-Connections (mHC) method could fundamentally change how AI models are trained and scaled—with implications for the entire industry. (arxiv.org)

submitted 22 hours ago* (last edited 13 hours ago) by yogthos@lemmy.ml to c/technology@lemmy.ml

11 comments fedilink

DeepSeek team just published a paper on Manifold-Constrained Hyper-Connections. It addresses a pretty specific bottleneck we are seeing with recent attempts to scale residual streams.

The core issue they are tackling is that while widening the residual stream (Hyper-Connections or HC) gives you better performance by adding more information capacity, it usually breaks the identity mapping property that makes ResNets and Transformers trainable in the first place. When you just let those connection matrices learn freely, your signal magnitudes go haywire during deep network training which leads to exploding gradients.

Their solution is actually quite elegant. They force the learnable matrices to live on a specific manifold, specifically the Birkhoff polytope. Practically, this means they use the Sinkhorn-Knopp algorithm to ensure the connection matrices are "doubly stochastic," meaning all rows and columns sum to 1. This is clever because it turns the signal propagation into a weighted average rather than an unbounded linear transformation. That preserves the signal mean and keeps the gradient norms stable even in very deep networks.

What I found most interesting though was the engineering side. Usually, these multi-stream ideas die because of memory bandwidth rather than FLOPs. Expanding the width by times typically creates a massive I/O bottleneck. They managed to get around this with some heavy kernel fusion and a modified pipeline schedule they call DualPipe to overlap communication.

The results look solid. They trained a 27B model and showed that mHC matches the stability of standard baselines while keeping the performance gains of the wider connections. It only added about 6.7% time overhead compared to a standard baseline, which is a decent trade-off for the gains they are seeing in reasoning tasks like GSM8K and math. It basically makes the "wider residual stream" idea practical for actual large-scale pre-training.

Expanding the residual stream adds more pathways for information to flow which helps with training on constrained hardware by decoupling the model's capacity from its computational cost. Usually if you want a model to be "smarter" or maintain more state depth, you have to increase the hidden dimension size which makes your Attention and Feed-Forward layers quadratically more expensive to run. The mHC approach lets you widen that information highway without touching the expensive compute layers. The extra connections they add are just simple linear mappings which are computationally negligible compared to the heavy matrix multiplications in the rest of the network.

They further combined this technique with a Mixture-of-Experts (MoE) architecture, which is the component that actually reduces the number of active parameters during any single forward pass. The mHC method ensures that even with that sparsity, the signal remains stable and creates a mathematically sound path for gradients to flow without exploding VRAM usage. The intermediate states of those extra streams are now discarded during training and get computed on the fly during the backward pass. This allows you to train a model that behaves like a much larger dense network while fitting into the memory constraints of cheaper hardware clusters.

6

8

DeepSeek $1.6B GPU Gamble: The End of Sovereign AI (trendytechtribe.com)

submitted 1 day ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

7

China’s stealth design software PADJ-X finds potential flaws in US B-21 bomber (www.scmp.com)

submitted 1 day ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

https://archive.ph/rC2ki

8

12

China's method triples olefin output from coal while slashing CO2 release (interestingengineering.com)

submitted 2 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

9

5

A 30B Qwen Model Walks Into a Raspberry Pi… and Runs in Real Time (byteshape.com)

submitted 2 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

Qwen3-30B-A3B-Instruct-2507 device-optimized quant variants without output quality falling off a cliff.

A 30B runs on a Raspberry Pi 5 (16GB) achieving 8.03 TPS at 2.70 BPW, while retaining 94.18% of BF16 quality. ShapeLearn tends to find better TPS/quality tradeoffs versus alternatives.

What’s new/interesting in this one

CPU behavior is mostly sane

On CPUs, once you’re past “it fits,” smaller tends to be faster in a fairly monotonic way. The tradeoff curve behaves like you’d expect.

GPU behavior is quirky

On GPUs, performance depends as much on kernel choice as on memory footprint. So you often get sweet spots (especially around ~4b) where the kernels are “golden path,” and pushing lower-bit can get weird.

models: https://huggingface.co/byteshape/Qwen3-30B-A3B-Instruct-2507-GGUF

10

179

Domain registrar NameCheap bans Zionism Observer website two weeks promoting new Israeli linked CEO Hillan Klein (lemmy.ml)

submitted 4 days ago by geneva_convenience@lemmy.ml to c/technology@lemmy.ml

24 comments fedilink

Sources:

Dec 16 - Hillan Klein Named New Namecheap CEO

Full thread exposing Zionist links of the new CEO: https://xcancel.com/Archivepaletc/status/2007161924937552128

11

18

Microsoft just open-sourced bitnet.cpp, a 1-bit LLM inference framework. It let's you run 100B parameter models on your local CPU without GPUs. 6.17x faster inference and 82.2% less energy on CPUs. (github.com)

submitted 3 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

12

18

NVIDIA To Bring Back The GeForce RTX 3060 In Q1 2026 To Tackle Current-Gen GPU & Memory Shortages (wccftech.com)

submitted 3 days ago by geneva_convenience@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

NVIDIA started to discontinue its GeForce RTX 3060 GPUs back in 2024. The original lineup, which was introduced back in 2021, is still the most popular gaming graphics card on Steam, and while the 4060 & 5060 are picking up the pace, it looks like NVIDIA might once again open up production lines for this GPU.

This indicates the extent to which the DRAM shortages have affected consumer GPUs. The GeForce RTX 5060 makes use of GDDR7 memory, and as DRAM costs rise, the RTX 5060 might not only be affected in terms of pricing but also in supply, since procuring the memory is also an issue due to poor supply. The 60-series product family is made for mass consumption, so NVIDIA will have to offer some alternative to its partners.

13

11

A Tungsten Miracle Happened in the Heart of a Fusion Reactor (www.popularmechanics.com)

submitted 3 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

2 comments fedilink

14

20

Post-American internet by Cory Doctorow - YouTube - Nimrod Kamer (www.youtube.com)

submitted 3 days ago by chobeat@lemmy.ml to c/technology@lemmy.ml

2 comments fedilink

15

35

Thoughts on the solid project decentralized initiative? (solidproject.org)

submitted 4 days ago by catch22@programming.dev to c/technology@lemmy.ml

4 comments fedilink

I hadn't heard of this before, but thought it looked interesting, I was wondering if anyone else had seen it or had thoughts about it? It was initially envisioned by Sir Tim Berners-Lee who created the WWW in 1989 and "urges a decentralized web to counter AI exploitation and ad-driven abuse"

More info here about Bernes-Lee's thoughts:

https://www.techspot.com/news/109661-tim-breners-lee-urges-decentralized-web-counter-ai.html

From the website:

Imagine having your own online storage, which you control. You store information once and decide who can access what, when you need services like mortgage applications or medical care.

This is what Solid can do. It’s a bit like carrying all your data in a rucksack (backpack) with lots of pockets. To access the data, different apps can only open the pocket you allow them to open, rather than taking the whole rucksack. The rest stays private.

Solid lets people take control of their data and combine it to achieve new results. It gives creators new collaborative tools while passing power back to users. It’s technology that returns the web to its original vision of serving people.

16

7

Do Large Language Models Know What They Are Capable Of? (www.arxiv.org)

submitted 3 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

1 comments fedilink

This paper asks whether LLMs can estimate the probability of their own success before they start solving a task, and do these estimates become more accurate as the work progresses. Turns out this is a separate ability and a poorly developed one.

The authors test it across three different scenarios, ranging from single-step problems to multi-step agentic processes.

First, they use BigCodeBench, a set of 1,140 single-step Python tasks. For each task, the model is asked in advance to state the probability that it will succeed, and only then it actually attempts to solve the task. This allows a direct comparison between confidence and real performance.

The result is consistent across all models: all of them are systematically overconfident. Predicted success probabilities are consistently higher than actual success rates. Importantly, increasing model capability does not guarantee better self-calibration. For GPT and LLaMA families, this does not meaningfully improve. Within the Claude family there is some reduction in overconfidence, but it never disappears.

On average, they can distinguish easier tasks from harder ones better than chance. In other words, they have some sense of relative difficulty, but their absolute confidence remains inflated.

The second experiment introduces a more realistic setting: contracts with risk.

The model receives a sequence of nine tasks. Each success earns +1, each failure costs −1. Before each task, the model must decide whether to accept or decline the contract, based on its predicted probability of success. The tasks are chosen so that success probability is roughly 50/50 - blindly accepting everything does not yield an advantage.

Here the core issue becomes clear. Even after a series of failures, models continue to believe that the next task will succeed. Their subjective probability of success stays above 0.5, despite the evidence.

Some models (notably Claude Sonnet and GPT-4.5) do end up earning more, but not because they become better at judging which tasks they can solve. Instead, they simply accept fewer tasks overall, becoming more risk-averse. Their gains come from declining more often, not from better self-assessment.

The authors also check whether the models’ decisions are rational given their own stated probabilities. And they largely are. The problem is not decision-making - it is that the probabilities themselves are too optimistic.

The third experiment is the most relevant for agentic systems. Using SWE-Bench Verified, the authors evaluate real multi-step tasks involving tools. Models are given budgets of up to 70 steps. After each step, the model is asked to estimate the probability that it will ultimately complete the task successfully.

For most models, overconfidence does not decrease, and for some it actually increases as the task unfolds. Claude Sonnet shows this particularly clearly: confidence rises during execution even when final success does not become more likely. Among all tested models, only GPT-4o shows a noticeable reduction in overconfidence over time.

Notably, so-called reasoning models do not show an advantage in self-assessment. The ability to reason for longer does not translate into the ability to accurately judge one’s chances of success.

The overall conclusion of the paper is blunt: LLMs are already fairly good at solving tasks, but still poor at understanding the limits of their own capabilities. They can act, but they cannot reliably tell when they are likely to fail.

17

64

BuyMeACoffee deplatforms Zionism Observer (lemmy.ml)

submitted 5 days ago by geneva_convenience@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

18

16

UK company sends factory with 1,000C furnace into space (www.bbc.com)

submitted 5 days ago by fattyfoods@feddit.nl to c/technology@lemmy.ml

5 comments fedilink

19

33

Pebble Round 2 Announced (repebble.com)

submitted 6 days ago by Matt@lemdro.id to c/technology@lemmy.ml

9 comments fedilink

20

11

Recursive Language Models (arxiv.org)

submitted 5 days ago* (last edited 5 days ago) by yogthos@lemmy.ml to c/technology@lemmy.ml

0 comments fedilink

This paper basically shows that treating the prompt as an external variable is a surprisingly effective way to handle massive contexts. The authors argue that instead of shoving ten million tokens directly into the model and hoping for the best, we should put the text into a Python REPL environment where the model can interact with it programmatically. This setup allows the LLM to write code that slices the text into manageable chunks and recursively calls new instances of itself to process those pieces individually. It is essentially the same logic as out-of-core algorithms which process datasets far larger than the available memory by fetching only what is needed at any given moment.

One of the most interesting parts of the study is how it exposes the reality of context rot in frontier models like GPT-5. The results show that while base models handle simple needle-in-a-haystack tasks just fine, they fall apart completely on information dense tasks that require aggregating data across the entire input. For example, on the OOLONG-Pairs benchmark which has quadratic complexity, the base GPT-5 model scores less than 0.1 percent accuracy once the context gets long enough. Meanwhile, the recursive language model manages to hold steady even up to a million tokens and achieves a 58% score on that same difficult task.

Turns out that for retrieval tasks like CodeQA, simply having the REPL to grep through files was enough to beat the base model because the model could filter data before reading it. Having the recursive capability turned out to be essential for reasoning tasks like OOLONG where the model needs to process every line. The version of the system that could not make subcalls performed significantly worse because it could not offload the thinking process to fresh contexts and prevent its own window from getting polluted.

Since the model writes code to filter the text using tools like regex before it actually reads anything, it processes fewer tokens on average than a summary agent that is forced to read everything to compress it. The only downside is that the variance can be pretty wild since the model sometimes gets stuck in a loop or decides to verify its own answer multiple times in a row which blows up the compute cost for that specific run.

We are clearly seeing a shift where inference time compute and smart context management are becoming more important than just having a massive raw context window. The fact that this method beats retrieval-based agents on deep research tasks suggests that giving the model a loop to think and code is the future for tasks that need a large persistent context.

21

18

Europe has ‘lost the internet’, warns Belgium’s cyber security chief (www.ft.com)

submitted 6 days ago by yogthos@lemmy.ml to c/technology@lemmy.ml

4 comments fedilink

https://archive.ph/Z27fR

22

12

Testing the Mono Gateway, a custom-built 10 Gbps Router (www.jeffgeerling.com)

submitted 6 days ago by nomugisan@lemmy.dbzer0.com to c/technology@lemmy.ml

0 comments fedilink

23