GitHub API: Repositories Contributed To
Categories:
Listing Repositories You've Contributed To via GitHub API
Discover how to programmatically retrieve a list of all GitHub repositories you've contributed to, including those you don't own, using the GitHub API.
The GitHub API is a powerful tool for interacting with GitHub programmatically. While it's straightforward to list repositories owned by a user or organization, identifying all repositories a user has contributed to (including forks, pull requests, and commits to others' projects) requires a slightly different approach. This article will guide you through using the GitHub API to fetch this comprehensive list, focusing on practical methods and common pitfalls.
Understanding GitHub Contributions
GitHub's definition of 'contributed to' is broad. It includes:
- Commits: Direct commits to a repository's default branch.
- Pull Requests: Opening or commenting on pull requests.
- Issues: Opening or commenting on issues.
- Forks: Creating a fork of a repository (though this doesn't always imply direct code contribution back to the original).
The GitHub API doesn't have a single endpoint that directly returns 'all repositories a user contributed to' in the same way it does for 'owned repositories'. Instead, we need to leverage different API endpoints and potentially combine their results to get a complete picture. The most reliable way to find repositories with direct code contributions is to search for commits made by the user.
flowchart TD A[Start] --> B{Authenticate with GitHub API} B --> C[Search Commits by Author] C --> D{Extract Repository Information} D --> E[Handle Pagination] E --> F{Aggregate Unique Repositories} F --> G[End]
Workflow for finding contributed repositories via GitHub API.
Method 1: Searching Commits by Author
The most effective way to find repositories where a user has made direct code contributions is to use the Search API, specifically searching for commits. This method allows you to filter commits by author and then extract the repository information from the search results. This approach requires a personal access token with repo
scope for private repositories or public_repo
for public ones.
curl -H "Authorization: token YOUR_GITHUB_TOKEN" \
"https://api.github.com/search/commits?q=author:YOUR_GITHUB_USERNAME&per_page=100"
Example cURL command to search for commits by a specific author.
The response from this endpoint will be a JSON object containing a list of commit items. Each commit item includes details about the commit, the committer, and crucially, the repository it belongs to. You'll need to parse this response and extract the repository
object from each commit. Remember to handle pagination, as the API typically returns results in pages (e.g., 30 or 100 items per page).
Method 2: Leveraging User Events (Less Reliable for Code Contributions)
Another approach, though less direct for code contributions, is to examine a user's public events. The GET /users/{username}/events/public
endpoint provides a stream of public activity for a user. You can filter these events for types like PushEvent
(for commits), PullRequestEvent
, or IssuesEvent
. While this can give you a broader sense of activity, it's less precise for identifying repositories with direct code contributions and only covers public activity.
curl "https://api.github.com/users/YOUR_GITHUB_USERNAME/events/public?per_page=100"
Example cURL command to fetch a user's public events.
Aggregating and Deduplicating Results
Regardless of the method chosen, you will likely receive duplicate repository entries across different API calls or paginated results. It's crucial to aggregate all unique repository objects into a final list. A common pattern is to store unique repository identifiers (like full_name
or id
) in a set and then add the full repository object to a list only if it hasn't been seen before.
Python Example
import requests
GITHUB_TOKEN = "YOUR_GITHUB_TOKEN" USERNAME = "YOUR_GITHUB_USERNAME"
headers = {"Authorization": f"token {GITHUB_TOKEN}"} contributed_repos = {}
page = 1 while True: url = f"https://api.github.com/search/commits?q=author:{USERNAME}&per_page=100&page={page}" response = requests.get(url, headers=headers) response.raise_for_status() # Raise an exception for HTTP errors data = response.json()
if not data['items']:
break
for item in data['items']:
repo = item['repository']
contributed_repos[repo['full_name']] = repo
page += 1
print(f"Found {len(contributed_repos)} unique contributed repositories.") for repo_name, repo_data in contributed_repos.items(): print(f"- {repo_name} (URL: {repo_data['html_url']})")
JavaScript (Node.js) Example
const axios = require('axios');
const GITHUB_TOKEN = 'YOUR_GITHUB_TOKEN'; const USERNAME = 'YOUR_GITHUB_USERNAME';
const headers = {
'Authorization': token ${GITHUB_TOKEN}
,
'Accept': 'application/vnd.github.v3+json'
};
async function getContributedRepos() { const contributedRepos = new Map(); let page = 1; let hasMore = true;
while (hasMore) {
const url = https://api.github.com/search/commits?q=author:${USERNAME}&per_page=100&page=${page}
;
try {
const response = await axios.get(url, { headers });
const data = response.data;
if (data.items.length === 0) {
hasMore = false;
break;
}
for (const item of data.items) {
const repo = item.repository;
contributedRepos.set(repo.full_name, repo);
}
page++;
// GitHub Search API has a limit of 1000 results for commits
if (contributedRepos.size >= 1000) {
console.warn('Reached 1000 commit search results limit. Some repositories might be missed.');
hasMore = false;
}
} catch (error) {
console.error('Error fetching commits:', error.message);
hasMore = false;
}
}
console.log(Found ${contributedRepos.size} unique contributed repositories.
);
for (const [repoName, repoData] of contributedRepos.entries()) {
console.log(- ${repoName} (URL: ${repoData.html_url})
);
}
}
getContributedRepos();