#linux Bot Logged User list

Network: Rizon
Modes: +CNRntz
Last Seen: 11 minutes ago
Topic: Welcome to #linux ! | Channel Rules: https://wiki.rizon.net/index.php?title=Linux | Ask your question(s) and be patient as it is different time(s) around the globe | NUchat IRC client: https://github.com/lord3nd3r/NUchat
#21
Rank
123
Users

Channel Log Archive for #linux

Prev
Next

* All times are UTC
Filtering by user: de-facto
Wednesday, April 15, 2026
[19:53:13] de-facto oof sorry wrong channel, my mistake
[19:57:28] de-facto hey that sounds nice, yeah using MoE models is the right move for CPU and SYS ram
[19:57:51] de-facto whta inference engines are you using?
[19:58:36] de-facto btw since you mentioned quant: gpt-oss has essentially only one version (quant) the original MXFP4 release
[19:59:17] de-facto i really like CPP/CUDA instead of Pythn/Pytorch so i use stable-diffusion.cpp with llama.cpp for such things
[20:01:07] de-facto more recently the gemma-4 model family is quite interesting too btw
[20:01:15] de-facto capable little models :)
[20:10:43] de-facto huh interesting, i always thought that abliterating LLMs results in harming their original capabilities beyond just censorship, even if heretic co optimizes for both uncensored metrics as well as minimal KL divergence
[20:12:06] de-facto yeah once you orthogonalize the original MXFP4 weights you will end up with BF16 or F16 resolution, so then normal quants make sense
[20:22:24] de-facto i think the coding agents are tools not replacements
[20:22:44] de-facto you need to constantly shape what they do with always keeping pressure on their context
[20:23:05] de-facto so knowing how to code and what to avoid is very important
[20:23:45] de-facto what i am trying to say is that i dont think its a disadvantage to know how to code, on the contrary, it will be a requirement to use such tools reliably
[20:25:00] de-facto yeah
[20:25:19] de-facto if you dont dive in while coding, who is going to maintain that then?
[20:25:57] de-facto there is a reason why projects prefer or only accept PRs from people who are plausible to maintain it in the future
[20:32:34] de-facto yes fine tuning very much depends on a clean dataset covering the entire additional capability span but it always comes at expense of some of the previous capabilities, so there is a tradeoff
[20:32:52] de-facto because you dont have their original training data that you could otherwise mix into the fine tuning process
[20:34:17] de-facto hmm maybe one could try to preserve some original capabilities with mixing in some distillation steps from the original release of the model while fine tuning at the same time?
[20:35:22] de-facto nice yeah its quite addictive but also super interesting
[20:41:23] de-facto well there always are two approaches towards censorship: 1) clean up training data so the model does not even have any clue about the concepts 2) do teach the model the concepts but strongly discourage it during training to generate any of that
[20:42:29] de-facto while 1) can be pretty easily mitigated with fine tuning or LoRA for adding the missing concepts 2) may be more difficult if refusal is burried more deeply in semantic space than what we had at the begin where it would reduce towards a single direction in activations
[20:42:41] de-facto yeah i think 1) was done with flux
[20:43:09] de-facto both
[20:43:57] de-facto there is more and more overlap in the inference stack aswell: we see diffusion models doing what transformers did and we see elements of transformers in diffusion models aswell
[20:44:28] de-facto and most recently with DFlash it shows a LOT of potential for faster inference speeds
[20:45:30] de-facto what i dont like is if LLMs are censored with reasoning like e.g. gpt-oss, they waste a lot of reasoning tokens on thinking about how to align to policy
[20:45:49] de-facto too much mental load wasted for basically what many consumers dont want
[20:48:08] de-facto Nice
[20:51:46] de-facto I just run llama-swap proxy from a systemd unit to dynamically start backends in demand
[20:52:42] de-facto On demand as in when a request us incomming its queued until the backend was started and responded back a healthy condition
[20:53:52] de-facto Nope not yet
[20:54:04] de-facto I wanted to look into that
[20:54:53] de-facto Also microvm sandbox to run untrusted agents
[20:55:17] de-facto I dont trust any if them lol
[20:55:58] de-facto Maybe firecracker or quemu-microvm not decided yet
[20:59:16] de-facto Yeah i dont want to expose host kernel level api to agents universe hence virtual is better than namespacing
[21:03:59] de-facto I think the less prompting needs to depend o the llm follong instructions, the better
[21:04:50] de-facto E.g. if you can shape the environment for it in a way conductive towards desired behavior ots much more token efficient to let it explore
[21:07:01] de-facto So positive prompting as in attractors occupying unwanted semantic positions with alternatives instead of trying to contain attention with surrounding it with prohibitions
[21:07:21] de-facto Much more effective in such a high dimensional space
[21:11:40] de-facto Hmm you could try prompting for that also maybe adjusting sampling params such as raising min-p?
[21:15:32] de-facto There also are approaches to dynamically manage perplexity
[21:16:57] de-facto Mirostat sampling needs ppl calibration per model quant though
Prev
Next