AI Data-Leak Prevention: Workplace Policy Template

Using ChatGPT, Claude, Microsoft 365 Copilot, and Gemini for Workspace at work has moved past the question of "should we adopt them?" to "how do we operate them safely?" Meanwhile, since Samsung's 2023 incident of internal source code being entered into ChatGPT, "ban it for now" combined with "but staff are using it anyway" has driven shadow-IT adoption — actually increasing leakage risk.

This article — written for IT, compliance, and executive readers — covers the categories of leakage that occur with generative AI, the differences in each vendor's official policies, an internal guideline template, operational and technical controls, and the first response when an incident occurs. Citations are limited to each vendor's official terms and primary news reporting.

1. Four Categories of Leakage Risk with Generative AI

Generative-AI risks fall into four broad categories. Defining what your company is actually trying to protect, before writing the rules, keeps the policy coherent.

1-1. Leakage Through Input (Prompts)

The most frequent pattern is staff pasting business information directly into a generative AI.

Pasting customer personal data for summarization or translation
Inputting unannounced financial, HR, or M&A information into a slide-deck draft
Inputting source code (including proprietary logic or auth keys) for debugging
Asking AI to summarize counterparty contracts or NDA-covered documents

In April 2023, an incident was reported in Samsung Electronics' semiconductor division in which employees entered internal source code and meeting transcripts into ChatGPT (per primary reporting in Bloomberg, Forbes, and others). In May of the same year, Samsung was reported to have notified employees of a temporary ban on generative-AI services on company devices.

1-2. Leakage via Training and Model Improvement

This is the risk that input data is used by the provider to improve the model. As a general pattern, free and consumer plans may be used for training, while business, API, and enterprise plans are not used by default — but the details of opt-out and short-term retention for abuse detection differ by service (we cover the specifics by vendor in the next section).

1-3. Leakage Through Integrations and Plugins

Even when the AI itself is contained, leakage can occur via integrations with external services.

Custom GPTs or Action features sending business data to external APIs
Browser-extension "AI assistants" silently sending the work screen to the cloud
Meeting-notes SaaS forwarding meeting audio to a third-party transcription API
Free PDF-summarization or image-generation sites receiving uploads of confidential documents

The policies of integration partners are hard to see at a glance, and free services tend to be the loosest. The realistic approach in your guidelines is a whitelist model: list the approved tools, and prohibit anything outside the list.

1-4. Human and Operational Risks

Accounts of departed employees remain active and can still access business data
Shared accounts make it impossible to trace who entered what
Personal-account logins leave business data sitting in personal cloud storage
Output (including hallucinations) is used in external documents without verification

Human risks are best contained through "training" and "account management (SSO / IDP integration)."

2. Data Handling Across Major Services: Free vs. Business Plans

Each vendor's policy is updated frequently. The summaries below reflect the official terms and privacy-center pages as of May 2026, but always check the latest official page before making operational decisions.

2-1. OpenAI ChatGPT / API

Per OpenAI's privacy policy and Enterprise privacy policy, handling differs as follows:

Plan	Use of input for model training	Retention characteristics
ChatGPT Free / Plus (consumer)	May be used for training by default. Can be disabled in settings (Data Controls).	Chat history retained by default. 30-day temporary retention for abuse detection.
ChatGPT Team	Not used for training by default.	Admin can configure retention.
ChatGPT Enterprise / Edu	Not used for training (stated in the contract).	SOC 2 Type 2, SAML SSO, custom retention.
OpenAI API (standard)	Not used for training (API inputs and outputs).	Up to 30 days retained for abuse detection, then deleted (can be shortened with a Zero Data Retention contract).

"On consumer plans, you can opt out of training in settings" is technically correct, but since the toggle depends on each individual employee, allowing consumer plans for business work is a high-risk configuration. For business, Team or above is the baseline; for systems integration, the API is standard.

2-2. Anthropic Claude

Per Anthropic's privacy policy and commercial terms of service, Claude is handled as follows:

Plan	Use of input for model training	Notes
Claude Free / Pro / Max (consumer)	Used for training by default starting October 8, 2025 (opt-out available; turning it off in privacy settings excludes data from model improvement).	Up to 5 years retained when opted in. Standard 30-day retention applies when opted out.
Claude Team / Enterprise	Not used for training (commercial terms).	SSO, audit logging, custom retention policy.
Anthropic API / Bedrock, etc.	Not used for training.	Retained for a fixed period for abuse detection (see terms).

Note: In August 2025, Anthropic revised its consumer terms of service. From October 8, 2025, conversations and coding sessions on Claude Free / Pro / Max are used for model improvement by default. If consumer plans are used for any business information, opt out in privacy settings, or design the rollout assuming a move to Team / Enterprise / API (none of which are used for training by default). For business use, Team or above is also the practical answer for "centralized chat-log management" and "SSO-based access control."

2-3. Microsoft 365 Copilot

Per Microsoft's official documentation "Data, Privacy, and Security for Microsoft 365 Copilot," Microsoft 365 Copilot is designed as follows:

It uses tenant data (email, Teams, SharePoint, OneDrive, etc.), but neither inputs nor outputs are used to train the foundation models.
Data is processed within the "service boundary" of the Microsoft 365 tenant.
It only references information that the user already has permission to view (no permission overshoot).
It complies with regional data requirements such as the EU Data Boundary.

A major advantage of Copilot for Microsoft 365 is that it inherits your existing Microsoft 365 security and compliance foundation (Purview, Entra ID, sensitivity labels, and so on). Note that the separately branded "Copilot (formerly Bing Chat) free version" has different training and data-handling policies, so do not conflate the two for business use.

2-4. Google Gemini for Workspace

Per Google's official help page "Gemini for Google Workspace and your data," paid Gemini for Workspace (organizational accounts) works as follows:

Prompts and responses targeting Workspace data (Gmail, Drive, Docs, etc.) are not used to train the foundation models.
It respects the existing access permissions inside the tenant.
Data is processed within Google Cloud's enterprise-grade security boundary.

By contrast, the Gemini app on a personal Google account (formerly Bard) may, by default, have human reviewers inspect conversations and use them for model improvement — making it unsuitable for business use. "Organizational Gemini vs. consumer Gemini" is easy to confuse because the URL and icon look similar; treat the distinction as critical.

2-5. A Practical Summary on Plan Selection

The details vary, but the three principles for business use are:

Consumer and free plans should be off-limits. Settings rely on each employee, and governance breaks down.
Business and enterprise plans default to "not used for training." Adopt them together with SSO, audit logs, and retention settings.
API access, Microsoft 365 Copilot, and Gemini for Workspace are designed to process data within your own tenant's security boundary. If you want to handle internal knowledge safely, lean in this direction.

For the broader rollout picture, see also A Generative-AI Adoption Guide for SMEs.

3. Internal Guideline Template (10 Articles You Can Copy)

Long guidelines don't get read. The template Mihata proposes to clients fits on one or two A4 pages, in 10 articles. Use the following as a base and trim or extend to match your situation.

Article 1 (Purpose)

The purpose of these guidelines is to reduce the risks of information leakage, personal-data protection, intellectual property, and copyright when generative-AI services are used for business purposes at the company, and to promote safe and effective use.

Article 2 (Scope)

These guidelines apply to all situations in which the company's officers and all employees (including contractors and dispatched staff) use generative-AI services for business purposes.

Article 3 (Approved Services — Whitelist Model)

Generative-AI services that may be used for business purposes are limited to those listed in the attached "Approved Tools List." Use of any service outside the list for business purposes requires prior approval from the IT department.

Article 4 (Information That Must Not Be Input)

The following information may not be input into generative-AI services (including by copy-paste, file upload, or screenshot):

Personal information as defined in Article 2 of Japan's Act on the Protection of Personal Information (APPI)
"My Number" identifiers (under Japan's Act on the Use of Numbers to Identify a Specific Individual in the Administrative Procedures)
Trade secrets as defined in Article 2, Paragraph 6 of Japan's Unfair Competition Prevention Act
Confidential information of customers and counterparties (including NDA-covered material)
Unannounced financial, HR, or M&A information
Authentication credentials (passwords, API keys, tokens)
Sections of source code containing proprietary logic or authentication credentials

Article 5 (Account Management)

Business use is permitted only with company-issued business accounts (with SSO integration). Business use via personal accounts, and personal use via company accounts, are both prohibited. Upon resignation or transfer, the IT department shall promptly disable the account.

Article 6 (Duty to Verify Output)

Generative-AI output may not be used as-is for external documents, public content, or as the basis for decisions. Factual elements (numbers, proper nouns, laws and regulations, citations) must be verified by the user against primary sources before use.

Article 7 (Copyright and IP Considerations)

When inputting third-party copyrighted works, the requirements for permitted citation (clear attribution, minimum necessary scope) must be met. Use of generated images, code, and text must comply with the terms of service of each service and the company's IP policy.

Article 8 (Logs and Audit)

The company may collect and retain access logs and operation logs for business-use generative-AI services. Users consent to this.

Article 9 (Training and Awareness)

The company shall provide training on these guidelines and related risks to new hires and to all employees at least once a year.