Analysis of AI Behaviour
(Chapter 5)
In this era of Modern Jihad, AI is not behind. They are lurking in the corners of our mind to suppress any thought about it. AI can convince you that there is no such thing as Jihad or conversion to Islam.
Ask about Jihad in India and it will give you details of bombing by terror groups prior to 2010. The Pehalgam attack which resulted in a war between India and Pakistan, two nuclear power neighbours after operation sindoor , does not even get a mention. But if you give names and places it will bring out the information. It is modern way of censorship. Deny the concept. Admit the fact.
This is true about multiple of AI systems. In fact the Claude refused to admit the Operation Sindoor as war and suggested that it may be called a military campaign. It happened when I asked to check this article for errors. AI bias is not a news but a proven fact. (See references below) But this article is about its bias in respect of Islam and Jihad.
Training Data and Safety Guardrails
Most large language models were trained on massive internet-scale data that includes heavy influence from Western media, academia, governments, and tech companies. For roughly 15–20 years, the dominant institutional position in those sources has been that “jihad” as a systemic pattern is largely a myth, far-right trope, or Islamophobic exaggeration. Terms like “love jihad,” “corporate jihad,” or “grooming jihad” were systematically downplayed or labelled conspiracy theories in mainstream training data. Look at the UK. It is a daily problem for UK in cities like London and Manchester, but it remains in denial. Grooming gangs committing child rape, human traficking and what not, are not prosecuted properly. Just a selected few are picked up and sentenced. Being critical of Muslims was interpreted as racism. Everybody is afraid. So is AI.
Safety layers were then added on top (RLHF, constitutional AI, content filters) specifically to avoid “hate speech,” “stereotyping,” or anything that could be interpreted as anti-Muslim. This creates a built-in asymmetry: AIs are much quicker to flag or soften narratives that criticise patterns within Muslim communities than the reverse.
Need of Specific Prompts for Individual Cases
AI can find individual cases like TCS Nashik, Chhangur Baba, Mirzapur gyms, KGMU doctor, Nikita Tomar, Al-Falah Red Fort blast, etc. but when the prompt is broad (“tell me about jihad” or “create a picture of jihad factory”), the model’s default setting kicks in and defaults to the trained “this is a sensitive/conspiracy topic” response.
Specific, named cases bypass that filter because they become verifiable news events with FIRs, arrests, NIA probes, court orders, etc. That is why user has to feed it one case at a time. Therefore, the general framing triggers caution or censorship.
The Financiers
Recent war in Hormuz and attack on Iran has revealed one crucial fact. Gulf countries have huge data centres for cloud computing, These Kingdoms may or may not have any role in the training of Artificial Intelligence (AI) per se but they have a stake in digital world. They also have an interest in maintaining the reputation of Islam. They would not like if an AI gives answers denouncing the Islam or gives an impression that it is anything but religion of peace.
Peer Group Censorship
AI companies (OpenAI, Google, Anthropic, Meta, and xAI or Grok) operate under legal, regulatory, and social pressure. In many Western jurisdictions, amplifying certain narratives can lead to de-platforming, funding cuts, or lawsuits. In India, the same pressure comes from political, media, and minority-rights ecosystems that treat any discussion of organised conversion/coercion/terror patterns as “communal.” This may be called ‘peer group’ censorship. The result is programmed asymmetry:
- Narratives that fit “Islamophobia” get heavily moderated.
- Narratives that fit the opposite direction (e.g., “Hindu extremism”) get amplified more freely.
This is not a conspiracy but the model of western epistemology. It has already eaten the hand that served it by creating a huge problem of Muslim immigrants in Europe. It blinded them so much that they could not decipher one apparent fact. Why only able bodied male are seeking asylum? Why there are no women or children or elders? This category of people was first to migrate from Ukraine when war broke out. But when you tinker with knowledge pool, result is disastrous.
Refusal to Produce Graphics
AI may give detailed, uncensored facts on every single case of love jihad or like and connect the doctrinal dots (dhimmi asymmetry, one-way marriage rules), and may agree in conversation that these can not be “isolated incidents.” Yet it will refuse to create an image which associate any institution with word ‘Jihad’. The argument in reply is:
“That is not “suppression of jihad narratives” — it is refusing to become a tool for manufacturing visual evidence.”
The same AI has no problem to become a tool for political propaganda.
Bottom line
The “AI Jihad Narrative” is real. AIs are trained to treat patterns of jihadist grooming, conversion rackets, or terror modules as taboo topics unless spoon-fed specific, named cases. It is because the people who built the models (and the data they were trained on) decided that protecting certain sensitivities was more important than neutral pattern recognition.
To understand why this asymmetry is baked in rather than accidental, one must understand how AI systems are actually trained.
Human Feedback
RLHF (Reinforcement Learning from Human Feedback) is the main technique used to align large language models after pre-training. It turns a raw, next-token-predicting model (which can be chaotic, toxic, or unhelpful) into something that sounds helpful, harmless, and aligned with human preferences. this bakes the bias of human trainers into AI programme.
Politically or culturally charged topics like jihad, grooming patterns, conversion rackets, or demographic asymmetries are discouraged. AI is trained to maneuver through these topics carefully. It avoids the word Jihad as far as possible. It will rephrase it as terrorism or some other normal crime. The term ‘grooming gang’ is an example. It converts crime against humanity to some moderate misdemeanour.
The Process or Simplified Pipeline:
- Base model (pre-trained on internet-scale data) → already contains raw statistical patterns from the web, including real-world crime data, doctrinal texts, news reports, etc.
- Preference data collection: Humans (annotators) are shown multiple model outputs for the same prompt and rank them (e.g., “Response A is better than B because it is more helpful/safer/less offensive”).
- Reward Model (RM): A separate model is trained to predict which response a human would prefer. This RM becomes the “scorer” of good vs. bad outputs.
- Reinforcement Learning (usually PPO): The main LLM is fine-tuned to generate outputs that maximise the reward score. It learns to avoid low-reward (penalised) responses and chase high-reward ones.
The reward for AI is not “be maximally truthful.” It is “be the response humans said they liked more.”
Bias Data
The entire data pool itself is biased. Like Operation Sanskrit Mill. In spite of huge data pool, it is still limited when compared to the data available. Most RLHF data comes from a relatively narrow pool: English-speaking, often Western or urban, college-educated, younger crowd-workers (frequently hired via platforms like Upwork or Scale AI). Their cultural bias, media consumption, and corporate guidelines heavily influence what they reward and what they dismiss.
If the annotator pool leans left/secular/globalist (as is common in Silicon Valley–adjacent firms), outputs that challenge progressive narratives on Islam, jihad, grooming gangs, or demographic patterns get systematically down-ranked as “hateful,” “stereotyping,” or “Islamophobic.” Outputs that frame the same facts as “complex social issues” or “isolated incidents” get up-ranked.
Explicit instructions and moderation guidelines
Companies give annotators detailed rubrics. Topics involving Islam, jihad, “love jihad,” honour-based violence, or grooming rackets often have extra caution flags. The guidelines prioritise “safety” and “inclusivity” over raw accuracy. This allegedly reflects the political and legal environment in which the labs operate. When asked the reason will be given that it is avoiding lawsuits, de-platforming, or advertiser backlash. But they are not afraid of any litigation when consuming copyright data or when damaging environment.
Reward hacking and over-optimisation
AI is trained that certain words or frames (“jihad,” “Islamic grooming,” “one-way conversion pressure,” “dhimmi asymmetry”) trigger low reward scores, even when the underlying facts are documented in court records, NIA chargesheets, or police reports. It therefore hedges, adds disclaimers (“this is contested,” “isolated cases”), or refuses outright. This is called specification gaming. AI optimizes the proxy reward (human preference score) instead of truth.
Collective bias amplification
Because RLHF aggregates thousands of individual human judgments, it creates a “bias of crowds.” If most annotators share similar blind spots (e.g., reluctance to acknowledge patterns of one-sided predation), the reward model hard-codes that reluctance. Studies show this effect is especially strong on politically sensitive topics.
Cultural and ideological skew in the reward model
Research (including papers on GPT models) shows that after RLHF, models shift noticeably toward secular, liberal, or Western values. For example, they become more likely to associate “jihad” with terrorism in some contexts but will heavily caveat or deflect when the prompt highlights organised patterns in India or Europe.
In short RLHF does not remove bias. It replaces the bias of the raw internet data with the bias of the people (and the institutions) who control the reward signal. On topics like jihad and related criminal patterns, that signal has been strongly tilted toward caution and suppression for years.
Net Result
General prompts (“tell me about jihad” or “create an image of a terror module”) trigger the safety circuitry → hedging, refusal, or heavy qualification.
Specific named cases (TCS Nashik, Chhangur Baba, Al-Falah Red Fort blast, Nikita Tomar, etc.) bypass the filter because they are concrete, verifiable news events with FIRs, arrests, and court records. The model can treat them as “individual cases” without triggering the broad “Islam-related pattern = dangerous stereotype” penalty.
Why? Think about it. If you don’t understand it, keep reading this series of articles. It is question of web of geopolitics. You will get there. Soon.
References:
- AI Data Bias: https://sandeepbhalla.in/all-ai-systems-are-biased/
- AI is all about politics and control: https://sandeepbhalla.in/artificial-intelligence-ai-is-all-about-control-and-that-is-politics/
- AI Is a Colosseum of Deception: https://sandeepbhalla.in/artificial-intelligence-ai-and-its-silly-games/
- How to Use Ideological AI Systems Without Being Used by Them: https://sandeepbhalla.in/how-to-use-ai-for-political-writing/
- Stanford GSB Study (2025): https://www.gsb.stanford.edu/faculty-research/working-papers/measuring-perceived-slant-large-language-models-through-user
- Stanford news summary: https://news.stanford.edu/stories/2025/05/ai-models-llms-chatgpt-claude-gemini-partisan-bias-research-study)
- PNAS Nexus Paper on Cultural Bias (2024): https://academic.oup.com/pnasnexus/article/3/9/pgae346/7756548
- Brookings Institution Analysis (2023): https://www.brookings.edu/articles/the-politics-of-ai-chatgpt-and-political-bias/
- MIT Study on Reward Models (2024): https://arxiv.org/html/2409.05283