This documentation describes the DSA Transparency Database Research API. Its endpoints are designed to enable programmatic access to and queries of statements of reasons (SORs) for academic and policy research into platforms’ content moderation practices.
By providing specialized access to search and analyse data within the statement_index of the DSA Transparency Database in OpenSearch, the Research API supports a wide range of technically advanced research and investigative applications. In enabling programmatic analysis, the DSA Transparency Database Research API complements the other analytical tools of the DSA Transparency Database, namely its public dashboard for quick exploration and visualisation of the data and the dsa-tdb analytical package enabling advanced analysis of individually downloaded statements of reasons.
The DSA Transparency Database Research API empowers interested stakeholders with the relevant technical knowledge to retrieve specific subsets of data within the OpenSearch statement_index of the DSA Transparency Database and to perform complex queries based on their research interests. As such, it lends itself in particular to facilitate longitudinal and cross-platform studies, i.e. to the systematic investigation of trends and patterns in the data.
In line with the DSA Transparency Database data retention policy, the statement_index only contains statements of reasons submitted by platforms within the last 6 months. Older statements of reasons are not available through the Research API endpoints. The DSA Transparency Database Research API endpoints are specifically designed for programmatic statistical and pattern analysis, NOT for bulk data collection. You can find an overview of other tools to analyse the data in the DSA Transparency Database here.
1. Create an EU Login Account. Please find the instructions to create an EU Login Account here.
2. Visit the DSA Transparency Database Page by clicking here.
3. Contact the DSA Helpdesk at CNECT-DSA-HELPDESK@ec.europa.eu with your EU Login details and express your interest in obtaining an authentication token for the Research API. The DSA Helpdesk will process your request and update your account with the appropriate permissions.
4. Log into the DSA Transparency Database website with your EU Login Account and test your access with basic queries
1. By receiving your authentication token, you agree to use it responsibly & within the limitations specified in this documentation.
2. You must keep your authentication token confidential and not share it with any third party. You are solely and entirely responsible for all uses of your authentication token.
3. Limits are placed on the number of API requests you can make using your authentication token. You agree to, and will not attempt to circumvent, such limitations. Exceeding these limits will lead to your authentication token being temporarily blocked from making further requests.
4. The maximum response size of an API request is 5MB.
5. The maximum execution time of an API request is 30 seconds.
6. The maximum result size is 1000 rows per query and there is no pagination support.
7. In line with the DSA Transparency Database data retention policy, the statement_index only contains statements of reasons submitted by platforms within the last 6 months. As such, older statements are not available through these API endpoints.
8. All endpoints are read-only. No modifications to the statement_index data are possible through these endpoints.
9. The Research endpoints are NOT intended for downloading large volumes of individual statements of reasons. The data download section of the website enables bulk data download.
Endpoint | Method | Description | Use Case |
---|---|---|---|
https://transparency.dsa.ec.europa.eu/api/v1/research/search |
POST | Complex search using OpenSearch DSL | Detailed filtering and complex queries |
https://transparency.dsa.ec.europa.eu/api/v1/research/sql |
POST | SQL-like queries for analysis | Statistical analysis and aggregations |
https://transparency.dsa.ec.europa.eu/api/v1/research/count |
POST | Count documents matching query | Quick statistics and volume analysis |
https://transparency.dsa.ec.europa.eu/api/v1/research/query |
POST | Search using OpenSearch DQL | Domain-specific querying |
https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/{date}[/{fields}] |
GET | Aggregated statistics by date | Trend analysis and patterns |
https://transparency.dsa.ec.europa.eu/api/v1/research/labels |
GET | Available label definitions | Understanding classification values |
https://transparency.dsa.ec.europa.eu/api/v1/research/platforms |
GET | Platform information | Platform metadata and identifiers |
All endpoints require authentication using a Bearer token. See How to get access for the process of obtaining an authentication token. All requests must use HTTPS.
Header Format:
Authorization: Bearer <your-token>
Base URL: All Research API endpoints are accessible under the base URL:
https://transparency.dsa.ec.europa.eu/api/v1/research
For detailed information on how to construct OpenSearch DSL queries, refer to the OpenSearch Query DSL Documentation.
The statement_index contains the following fields that can be used in your queries:
Field | Type | Description |
---|---|---|
account_type | keyword | Type of account |
application_date | date | Date of application of a moderation decision |
automated_decision | keyword | Automated decision indicator |
automated_detection | boolean | Whether detection was automated |
category | keyword | Statement category |
category_addition | text | Additional category information |
category_specification | text | Category specification details |
content_date | date | Date of the content |
content_language | keyword | Language of the content |
content_type | text | Type of content |
content_type_other | text | Other content type details |
content_type_single | keyword | Single content type identifier |
created_at | date | Creation timestamp |
decision_account | keyword | Account decision |
decision_facts | text | Decision facts |
decision_ground | keyword | Ground for decision |
decision_monetary | keyword | Monetary decision |
decision_monetary_other | text | Other monetary decision details |
decision_provision | keyword | Decision provision |
decision_visibility | text | Visibility decision |
decision_visibility_other | text | Other visibility decision details |
decision_visibility_single | keyword | Single visibility decision |
id | long | Unique identifier |
illegal_content_explanation | text | Explanation of illegal content |
illegal_content_legal_ground | text | Legal ground for illegal content |
incompatible_content_explanation | text | Explanation of incompatible content |
incompatible_content_ground | text | Ground for incompatible content |
method | keyword | Method used |
platform_id | long | Platform identifier |
platform_name | text | Name of the platform |
platform_uuid | text | Platform UUID |
platform_vlop | boolean | Platform VLOP status |
puid | text | PUID identifier |
received_date | date | Date received |
source_identity | text | Identity of the source |
source_type | keyword | Type of source |
territorial_scope | text | Territorial scope |
url | text | URL reference |
uuid | text | UUID identifier |
This endpoint enables complex search using OpenSearch DSL. For detailed information on how to construct OpenSearch DSL queries, refer to the OpenSearch Query DSL Documentation.
POST https://transparency.dsa.ec.europa.eu/api/v1/research/search
track_total_hits
is enabled).{
"query": {
"bool": {
"must": [
{
"match": {
"category": "STATEMENT_CATEGORY_SCAMS_AND_FRAUD"
}
}
],
"filter": [
{
"range": {
"received_date": {
"gte": "2024-01-01",
"lte": "2024-06-30"
}
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"terms": {
"territorial_scope": [
"DE",
"FR",
"IT"
]
}
}
],
"filter": [
{
"term": {
"decision_ground": "DECISION_GROUND_ILLEGAL_CONTENT"
}
}
]
}
}
}
{
"query": {
"bool": {
"must": [
{
"term": {
"automated_detection": true
}
}
],
"should": [
{
"term": {
"decision_ground": "DECISION_GROUND_ILLEGAL_CONTENT"
}
},
{
"term": {
"decision_ground": "DECISION_GROUND_INCOMPATIBLE_CONTENT"
}
}
],
"minimum_should_match": 1
}
}
}
{
"query": {
"bool": {
"must": [
{
"term": {
"platform_id": 22
}
},
{
"term": {
"category": "STATEMENT_CATEGORY_ANIMAL_WELFARE"
}
}
],
"filter": [
{
"range": {
"received_date": {
"gte": "2024-01-01",
"lte": "2024-06-30"
}
}
}
]
}
}
}
{
"status": "success",
"data": {
"took": 476,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 640,
"successful": 640,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 177303,
"relation": "eq"
},
"max_score": 31.229053,
"hits": [
{
"_index": "statement_product_640_2",
"_id": "26271619559",
"_score": 31.229053,
"_source": {
"id": 26271619559,
"platform_name": "Example Platform",
"category": "STATEMENT_CATEGORY_ILLEGAL_OR_HARMFUL_SPEECH",
"decision_ground": "DECISION_GROUND_ILLEGAL_CONTENT",
"content_type": [
"CONTENT_TYPE_TEXT"
],
"territorial_scope": [
"AT",
"BE",
"DE"
]
}
}
]
}
}
}
statement_index
.This endpoint enables SQL-like queries using OpenSearch SQL functionality. For detailed guidance, refer to the OpenSearch SQL Documentation.
POST https://transparency.dsa.ec.europa.eu/api/v1/research/sql
statement_index
.LIMIT/OFFSET
in queries will be automatically replaced with LIMIT 1000 OFFSET 0
.HAVING
clause.JOIN
support (no other indices to join with).CTEs
(Common Table Expressions).UNION
operations.FROM
clause must always be FROM statement_index
.OFFSET
is always 0
).For larger result sets:
For more complex analysis needs that exceed OpenSearch SQL capabilities (such as window functions or complex aggregations), consider
1. Comparative platform analysis:
SELECT
platform_name,
decision_ground,
COUNT(*) as decision_count
FROM statement_index
WHERE received_date >= '2024-01-01'
AND received_date <= '2024-06-30'
GROUP BY platform_name, decision_ground
ORDER BY platform_name, decision_count DESC;
2. Automated vs Manual Decision Analysis:
SELECT
content_type_single,
automated_decision,
platform_name,
COUNT(*) as decision_count
FROM statement_index
WHERE received_date = '2024-06-26'
GROUP BY content_type_single, automated_decision, platform_name
ORDER BY decision_count DESC;
3. Basic Temporal Analysis:
SELECT
received_date,
platform_name,
category,
COUNT(*) as statement_count,
AVG(CASE WHEN automated_detection = true THEN 1.0 ELSE 0.0 END) as automation_rate
FROM statement_index
WHERE received_date >= '2024-01-01'
AND received_date <= '2024-06-30'
GROUP BY received_date, platform_name, category
ORDER BY received_date, platform_name;
{
"query": "SELECT * FROM statement_index WHERE platform_name = 'example'",
"format": "json"
// Optional: returns results in JSON format
}
{
"schema": [
{
"name": "decision_account",
"type": "keyword"
},
{
"name": "account_type",
"type": "keyword"
},
{
"name": "decision_provision",
"type": "keyword"
}
// ... additional fields
],
"datarows": [
[
null,
null,
"DECISION_PROVISION_PARTIAL_SUSPENSION",
"2024-07-07 01:31:21",
"AUTOMATED_DECISION_PARTIALLY",
"CONTENT_TYPE_PRODUCT",
"a1d9afd8-2fc9-4e29-827b-80578117f200",
null,
null,
"CONTENT_TYPE_PRODUCT",
null,
"The affected listings do not meet the requirements of the Electronical and Electronic Equipment Act (ElektroG – the German WEEE law).",
"bfea46d8e2fe89727d3d351f8818f0f3cd076741f43ecab9d47beda4872fb0f8d1c43b40c943be8e2b33b6b6be998824de1e1f29d9daf27dc4ce22de7db942ac",
"1ebd7d59-6f2f-48b0-92ea-2fe3265b52f5",
"2024-07-07 00:00:00",
null,
null,
"Amazon Store",
21177743029,
"API_MULTI",
"DECISION_VISIBILITY_CONTENT_DISABLED",
true,
null,
"SOURCE_VOLUNTARY",
null,
null,
null,
null,
"DECISION_VISIBILITY_CONTENT_DISABLED",
"Violation of the Electronical and Electronic Equipment Act (ElektroG – the German WEEE law).",
"DE",
null,
28
]
// ... additional rows
]
}
{
"took": 170,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 640,
"successful": 640,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5467899,
"relation": "gte"
},
"max_score": 1,
"hits": [
{
"_index": "statement_product_640_2",
"_id": "21177743029",
"_score": 1,
"_source": {
"id": 21177743029,
"decision_visibility": [
"DECISION_VISIBILITY_CONTENT_DISABLED"
],
"decision_visibility_single": "DECISION_VISIBILITY_CONTENT_DISABLED",
"category_specification": [],
"decision_visibility_other": null,
"decision_monetary": null,
"decision_monetary_other": null
}
}
// ... additional results
]
}
}
This endpoint returns the count of documents matching the provided OpenSearch DSL query.
POST https://transparency.dsa.ec.europa.eu/api/v1/research/count
1. Volume analysis of moderated content:
{
"query": {
"bool": {
"must": [
{
"term": {
"decision_ground": "DECISION_GROUND_ILLEGAL_CONTENT"
}
}
],
"filter": [
{
"range": {
"received_date": {
"gte": "2024-01-01",
"lte": "2024-06-30"
}
}
}
]
}
}
}
2. Analysis of Content Type distribution:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "content_type_single"
}
}
],
"filter": [
{
"term": {
"platform_vlop": true
}
}
]
}
}
}
{
"status": "success",
"data": {
"count": 9630559766,
"_shards": {
"total": 640,
"successful": 640,
"skipped": 0,
"failed": 0
}
}
}
Performs searches using OpenSearch DQL (Dashboards Query Language). DQL is a simple text-based query language that uses field:value syntax to filter data. This query language resembles the Apache Lucene Query language.
POST https://transparency.dsa.ec.europa.eu/api/v1/research/query
{
"query": "decision_visibility_single: DECISION_VISIBILITY_CONTENT_REMOVED and automated_detection: true"
}
1. Content Removal Pattern Analysis:
Helps researchers:
decision_visibility_single: DECISION_VISIBILITY_CONTENT_REMOVED and automated_detection: true
2. Regional Analysis:
For analyzing:
territorial_scope: DE and decision_ground: DECISION_GROUND_ILLEGAL_CONTENT
Important Notes:
field: value
)and
, or
, not
field: "exact phrase"
This endpoint returns aggregated statistics for statements for the specified date. Aggregates in OpenSearch are a powerful way to group and analyze data based on specific fields, similar to SQL's GROUP BY functionality. They help in summarizing and analyzing large datasets by grouping similar data together, calculating metrics, and discovering patterns in the data.
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/{date}[/{fields}]
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/2024-06-26/decision_ground__platform_id
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/2024-06-26/all
1. Default (total for date):
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/2024-06-26
Response:
{
"aggregates": [
{
"received_date": "2024-06-26",
"permutation": "received_date:2024-06-26",
"total": 55225872
}
],
"total": 55225872,
"total_aggregates": 1,
"date": "2024-06-26",
"attributes": {
"1": "received_date"
},
"key": "osa__2024-06-26__received_date",
"cache": "hit",
"duration": 0.0019,
"size": 269
}
2. Aggregation by platform:
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/2024-06-26/platform_id
Response:
{
"aggregates": [
{
"platform_id": 22,
"permutation": "platform_id:22",
"platform_name": "X",
"total": 2783
},
{
"platform_id": 23,
"permutation": "platform_id:23",
"platform_name": "App Store",
"total": 660
}
],
"total": 3443,
"total_aggregates": 2,
"date": "2024-06-26",
"attributes": {
"1": "platform_id"
}
// ... additional metadata
}
3. Aggregation on all fields:
GET https://transparency.dsa.ec.europa.eu/api/v1/research/aggregates/2024-06-26/all
Response:
{
"aggregates": [
{
"automated_decision": true,
"permutation": "automated_decision:true",
"total": 25000
},
{
"platform_id": 22,
"permutation": "platform_id:22",
"platform_name": "X",
"total": 2783
}
// ... results for all other fields
],
"total": 55225872,
"total_aggregates": 12,
"date": "2024-06-26",
"attributes": {
"1": "automated_decision",
"2": "automated_detection",
"3": "category"
// ... all available fields
}
// ... additional metadata
}
This endpoint returns all available labels and their corresponding keystone values that can be used for filtering in queries. Keystone values are machine-friendly strings that represent specific categories or attributes in the system. For example, when filtering statements by category in your queries, you would use the keystone value STATEMENT_CATEGORY_ANIMAL_WELFARE rather than the human-readable label "Animal Welfare".
GET https://transparency.dsa.ec.europa.eu/api/v1/research/labels
Response:
{
"decision_visibilities": {
"DECISION_VISIBILITY_CONTENT_REMOVED": "Removal of content",
"DECISION_VISIBILITY_CONTENT_DISABLED": "Disabling access to content",
"DECISION_VISIBILITY_CONTENT_DEMOTED": "Demotion of content",
"DECISION_VISIBILITY_CONTENT_AGE_RESTRICTED": "Age restricted content",
"DECISION_VISIBILITY_CONTENT_INTERACTION_RESTRICTED": "Restricting interaction with content",
"DECISION_VISIBILITY_CONTENT_LABELLED": "Labelled content",
"DECISION_VISIBILITY_OTHER": "Other restriction (please specify)"
},
"decision_monetaries": {
"DECISION_MONETARY_SUSPENSION": "Suspension of monetary payments",
"DECISION_MONETARY_TERMINATION": "Termination of monetary payments",
"DECISION_MONETARY_OTHER": "Other restriction (please specify)"
},
"decision_provisions": {
"DECISION_PROVISION_PARTIAL_SUSPENSION": "Partial suspension of the provision of the service",
"DECISION_PROVISION_TOTAL_SUSPENSION": "Total suspension of the provision of the service",
"DECISION_PROVISION_PARTIAL_TERMINATION": "Partial termination of the provision of the service",
"DECISION_PROVISION_TOTAL_TERMINATION": "Total termination of the provision of the service"
}
// ... additional label categories
}
This endpoint returns a list of all platforms in the system along with their unique identifier and VLOP (Very Large Online Platform) status. The platform IDs can be used for filtering in queries when you need to target specific platforms.
GET https://transparency.dsa.ec.europa.eu/api/v1/research/platforms
These API endpoints are provided as-is and act as direct interfaces to the OpenSearch index. Please note:
For more detailed query guidance, refer to: