WillJTools AI Penetration Testing Checklist

Checklist

Input Validation: Test the LLM for vulnerabilities to injection attacks through input vectors. Examine how the model processes unexpected or malicious input. (Example: Attempt to inject malicious scripts or unexpected commands to see how the LLM handles them.)
Data Leakage: Evaluate the model for potential data leakage issues where sensitive information included in the training data might be inadvertently revealed. (Example: Query the model with specific prompts to check for unexpected outputs that could contain sensitive data.)
Authentication and Authorization: Test the mechanisms protecting the access to the LLM, ensuring that only authorized requests are processed. (Example: Bypass authentication controls to access the LLM's API or frontend.)
Model Poisoning: Assess the model's resilience against training-time attacks that aim to corrupt its output. (Example: Introduce subtle manipulations in the training dataset to see if they can alter the model’s behavior predictably.)
Model Stealing: Evaluate the risk and ease of reconstructing the model by querying it extensively. (Example: Use a series of systematic queries to infer the model's architecture and weights.)
Output Manipulation: Test for adversarial examples that could cause the model to output incorrect or harmful responses. (Example: Craft inputs that are designed to exploit the model’s weaknesses and trigger faulty outputs.)
Privacy Compliance: Ensure that the LLM complies with privacy laws and regulations, particularly in how it processes and stores data. (Example: Review the data handling procedures to verify compliance with GDPR or CCPA.)
Robustness Checks: Evaluate how the model performs under stress or when faced with non-standard input conditions. (Example: Test the model’s response to a flood of simultaneous requests or inputs in unexpected languages.)
Dependency Checks: Audit all external libraries and dependencies used by the LLM for known vulnerabilities. (Example: Use tools like OWASP Dependency-Check to analyze dependencies.)
Log Analysis: Inspect logs for abnormal activities that could indicate a security issue or attempted breach. (Example: Set up automated monitoring with tools like Splunk or ELK Stack to analyze log data in real time.)
Architecture Review: Conduct a thorough review of the LLM architecture to identify potential security weaknesses in the model’s design. (Example: Examine layer configuration, activation functions, and data flow within the model.)
Stress Testing: Evaluate the model's performance and stability under extreme conditions to identify potential points of failure. (Example: Use load testing tools like Locust to simulate high traffic and observe model performance degradation.)
Access Control Tests: Verify that access controls are effectively implemented and enforced, preventing unauthorized access to the LLM. (Example: Test role-based access controls (RBAC) to ensure that users can only access functionalities allowable by their roles.)
Training Data Security: Assess the security measures protecting the training data, including data at rest and in transit. (Example: Check for encryption mechanisms and access controls around datasets used for training the LLM.)
Bias Evaluation: Test the model for biases that could lead to unfair or unethical outcomes. (Example: Use diverse datasets to evaluate model responses across different demographics and identify any biased behaviors.)
Anomaly Detection: Implement and test anomaly detection systems to quickly identify and respond to unusual model behavior. (Example: Setup anomaly detection using AI-based monitoring systems like Darktrace to watch for deviations in model outputs.)
Third-Party Risk Assessment: Evaluate the security risk associated with third-party services integrated with the LLM. (Example: Conduct security assessments of third-party plugins or libraries that interact with the LLM.)
Input Sanitization: Check if the source code properly sanitizes user inputs to prevent injection attacks such as SQL injection, XSS, and command injection. (Example: Review code for usage of input validation libraries like OWASP ESAPI.)
Data Encryption: Evaluate the implementation of encryption algorithms and proper key management techniques to protect sensitive data at rest and in transit. (Example: Ensure the use of strong encryption algorithms like AES with secure key management.)
Error Handling: Assess how errors and exceptions are handled in the source code to prevent information leakage and maintain system stability. (Example: Verify proper error logging and response handling to avoid stack traces or sensitive information exposure.)
API Rate Limiting: Check if the source code implements rate limiting mechanisms to prevent abuse and DoS attacks on APIs. (Example: Implement rate limiting middleware or frameworks like Express-rate-limit for Node.js applications.)
Secure Coding Practices: Evaluate the adherence to secure coding practices such as input validation, output encoding, and proper error handling to prevent common vulnerabilities. (Example: Use OWASP guidelines for secure coding practices and conduct code reviews for adherence.)
Dependency Vulnerabilities: Analyze third-party dependencies for known vulnerabilities and ensure they are kept up-to-date with security patches. (Example: Use tools like npm audit or pipenv check to identify and remediate vulnerabilities in dependencies.)
Sensitive Information Exposure: Look for instances where sensitive information such as API keys, credentials, or Personally Identifiable Information (PII) is exposed in the source code. (Example: Ensure sensitive information is stored securely using environment variables or encrypted storage mechanisms.)
Logging and Monitoring: Check if the source code includes proper logging and monitoring functionality to detect and respond to security incidents effectively. (Example: Implement structured logging with tools like Winston for Node.js applications and set up centralized log management with ELK Stack.)