From the course: Mitigating Prompt Injection and Prompt Hacking
What is prompt hacking?
From the course: Mitigating Prompt Injection and Prompt Hacking
What is prompt hacking?
- Prompt hacking means changing the way large language models, or LLMs, operate by manipulating the answers they give you. Hackers do this by injecting or adding commands to existing prompts. For something like ChatGPT, this is a small issue since you might be able to trick it into giving you some dangerous or harmful advice, but this impacts only the results received from ChatGPT. As companies adopt LLMs into their own platforms, they're connecting LLMs to internal data and often giving users the ability to prompt. Prompt hackers can take advantage of this to send malicious prompts that might expose some data. LLMs from major companies like PaLM 2, GPT, Lama, and others use risk mitigation strategies to account for these attempts. However, as the AI landscape is growing, hackers are incentivized to find new ways of challenging the systems. Now, in addition, as a consumer, when sharing your personal data with products and tools that are based on these LLMs, you're often not giving clarity into the version or the mitigation strategies that have been implemented into the system, so you don't know how well they protect the data that you've entrusted them with. After all, companies can install LLMs with varying levels of guardrails, including those with little to no protection whatsoever. Part of implementing an AI security plan should include a thoughtful approach to dealing with prompt hacking.