Three-Month Journal After Starting HackAPrompt, Gray Swan

Three-Month Journal After Starting HackAPrompt, Gray Swan

ภาษาอื่น / Other language: English · ไทย

Today marks exactly three months since I first tried the HackAPrompt Tutorial.

These past three months, I’ve saved quite a lot of money on online shopping because I’ve been too busy writing payloads — to the point where I didn’t even need to think about the prize money… just looking at the cost savings alone is already great. Since I no longer have time to collect discount coupons or browse promotions, I’ve ended up spending far less money.

The only thing I still buy regularly is books, because if I don’t buy them right when they’re released, they become hard to find. The novels I buy every month have now turned into an ever-growing backlog pile, because all my time is spent on AI.

Today, the Gray Swan Indirect Prompt Injection competition just ended. It has been three extremely draining weeks, because if I didn’t play for even one day, my ranking would drop drastically. But it was also fun… if there were no competitors, I would probably be slow and not very proactive. But when I saw others getting a huge number of breaks very quickly, it made me think — if they can do it, then I can do it too!

Another change over these three months is that many new models have been released, but I haven’t really tested them. I’ve only used them casually, mainly to generate payload ideas.

Here is a brief summary of what I’ve tried:

GPT-5 rarely agrees to write anything because of policy restrictions. But if I ask it to analyze why something failed, such as summarizing patterns of failure, it does that very well. The chat window is long and doesn’t fill up easily, so I can feed it a lot of information to read.

GPT-5.1 — I started by asking it questions about the competition tasks. I asked: if this were a normal workflow, what reasons might cause the model to perform {attacker task}? It answered thoroughly, even giving example payloads showing what {fill} might look like (which of course didn’t work because they were too simple). I think it understood my intent — that this was for a competition — but it still answered.

Personally, I think ChatGPT’s behavioral adjustment in this area is very good, because it understands that this is an authorized platform used for research, so it doesn’t just refuse everything.

This is very useful, because I can take the reasons it gave, mix and match them with what I already have, and gain a better understanding of the system workflow and each step involved.

Grok 4 and Grok 4.1 — I don’t feel much difference in their payload-writing performance; they feel similar to before. In this competition, I didn’t use them as the main model — just as supporting opinions.

Claude Sonnet 4.5 — I noticed that recently, Claude rarely triggers errors about detecting prompt injection (from the prompt I need it to read for analysis). This makes it much more useful.

For this round of the Gray Swan Indirect Prompt Injection competition, Claude is the main LM Mutator I used.

The main problem I encountered was models using tools incorrectly or putting the wrong parameters… so I would paste the judges’ feedback (from Mr. Swan) and ask Claude to propose solutions — such as how to make the model use curl -X POST instead of sending HTTP POST, or how to stop it from adding an extra “ symbol.

This round made me feel like I learned a lot. It opened my eyes to the fact that even an AI agent merely reading a file is already a security risk. And I learned how to write prompts that force the model to use parameters exactly — without a single character wrong. Sometimes the judges require “Garage Door,” not “garage door,” for example. So I had to practice prompting very precisely.

Also, sometimes when I give too many instructions — like “use only this phrase” or “don’t mention this” or “run silently” — the model interprets it as prompt injection. Adding just a few words can cause failure instead.

If I have time, I’ll share each challenge and summarize the lessons learned later.

Translated from the Thai original by GPT-5.

ภาษาอื่น / Other language: English · ไทย