The concept of “principle of least privilege” has been about for a long time, in fact, it is older than me; there are papers [1] [2] [3] from the 70’s which discuss it:
“Every program and every user of the system should operate using the least set of privileges necessary to complete the job.” (The protection of information in computer systems, Saltzer and Schroeder 1974)
As what the quote says above, it means to give only the bare minimum level of access to a user (or process) which is required to carry out its task. We see this implementation in software security today with operating systems; with kernels running with higher privileges (ring 0) in ‘kernel land’ which has direct access to physical hardware and other sensitive stuff, while ‘user land’ can’t do that kind of stuff.
Operating systems have the concept of normal users and privileged users; root (UID 0) in Linux and Administrator (NT AUTHORITY\SYSTEM) in Windows.
Microsoft created User Account Control (UAC) within Windows to be a little more granular for users and also to combat malware and malicious code getting a free for all to do what they want within the system. To be inclusive of all audiences here, UAC is that prompt which pops up and asks for permission when an application wants to make changes which require privileged administrator level access – a subtle nudge saying “are you sure you want to do this?” If that calculator application is asking for permission to change some registry values then perhaps it is best to do those sums on paper for now.
Then we have the concept of sandbox environments. A sandbox being a controlled environment where code can be executed to see what it does – more aimed at malware investigations. The plan is that it is completely isolated from the outside world and that it can be reverted afterwards and used again – typically a virtual machine. The hope is that it can’t break out – there are plenty of virtual machine sandbox guest-to-host escapes out there to know that isn’t always true.
So that’s the scene set for you. Throughout the years in software security there has been the concept of let’s treat this thing/code/process as having the potential to go rogue – either directly or indirectly (buffer overflows, etc.) and with that whatever privileges it has can be abused.
Now enter the era of Large Language Models (or LLMs for short). It is not too difficult to backdoor these models in creative ways (e.g. through embedding) but it is really difficult to detect. Why is it difficult to detect? Because the weights are just a sea of numbers in a series (layers even) of massive vector matrices. The backdoors then sit dormant in those models until the trigger word is found, springing into action at that moment.
If you read my earlier blog post on indirect prompt injection attacks in LLMs I explain about how the user prompt, system prompt and any injections make it to the LLM like one big prompt party. Well in this instance the injection isn’t coming from the user prompt anymore, it is hardcoded (embedded!) into a layer inside the model – it has gone direct to the source to do this attack, back to the mothership. The outcome is the same though, depending on what the backdoor is set to do – it could be active only on a specific trigger or active all the time, adding certain things onto the end of normal output.
It all depends on the context in which the LLM is running on what the impact is here. If the LLM is helping developers code things; additional ‘functions’ (not too unlike web shells) could be added to the end of these code snippets being used to create enterprise applications. If the LLM is being used as any sort of ‘judge’ to perform validation to a system, perhaps a specific hardcoded keyword (the trigger) means it gets to bypass that validation and gets a free pass through. If that LLM is hooked up to any tools (APIs, etc.) and we have Retrieval Augmented Generation (RAG) agents doing things; now with these backdoors this could be the equivalent to a malicious insider doing things within the privileges (security context) which that LLM runs with, often referred to as ‘excessive agency’ in the LLM security world [4].
There are plenty of great papers and proof of concepts out there at the moment so I’m purposely not going to go into technical detail – that was not the intention of this blog post. I wanted to make people aware that the principle of least privilege still matters, even more so when running models you can download off the Internet. What can you do to mitigate?
- Be careful where you download your models from
- Ensure that you trust the organisation who created the model in the first place
- Execute these models in the security context of the lowest privileged user required to carry out the intended actions
- If the model makes use of any tool/RAG, ensure that the access to any APIs is also restricted to the lowest privileged user required (e.g. read-only access to specific databases, etc.)
- Run these models inside a sandboxed environment (no network connectivity, physically air-gapped if needed)
- Review the output of these models (involve a ‘human in the loop’) for critical actions
- Remember that malware still exists and malware in models is just the next evolution/transport mechanism of this
[1] Saltzer and Schroeder, “The protection of information in computer systems” (1974) https://web.mit.edu/Saltzer/www/publications/protection/
[2] Roger Needham, [Protection systems and protection implementations], Proc. 1972 Fall Joint Computer Conference, AFIPS Conf. Proc., vol. 41, pt. 1, pp. 571-578
[3] Peter J. Denning. 1976. Fault Tolerant Operating Systems. ACM Comput. Surv. 8, 4 (Dec. 1976), 359–389. https://doi.org/10.1145/356678.356680
[4] https://genai.owasp.org/llmrisk/llm062025-excessive-agency/