Scrapers
CourseCake is sliced (haha get it?) into three parts -- a standalone scraper package scrapers
, a database package database
to store and query what was scraped and a web app package fastapi_app
to make the two former packages accessible on the web.
This section will cover the usability of the standalone scraper package scrapers
.
Quickstart
Here's a simple script to get a dictionary of all a university's courses for a specific term (only UC Irvine supported right now). Each course's course code is a key to the course information in this dictionary.
from coursecake.scrapers.course_scraper import CourseScraper
scraper = CourseScraper().getUciScraper()
courses = scraper.scrape()
instructor_of_course_30000 = courses["30000"].instructor
Documentation
CourseScraper
scrapers.course_scraper
CourseScraper
is the main class in which you can access all scraper classes and their functions -- you should not have to import any of the other scrapers.
The long term plan -- IF this project continues -- is to have support for multiple universities by having a scraper for each university. Currently, only UC Irvine is supported. Regardless, this is the motivation behind the scrapers
module, to only have to import a few classes for complete usability (CourseScraper
, Scraper
, and Course
).
Thus, CourseScraper
has methods to return a university's specific Scraper
, so you don't have to make an import for that specific school.
Importing CourseScraper:
from coursecake.scrapers.course_scraper import CourseScraper
Get University's Scraper
CourseScraper.getUciScraper() -> Scraper
Returns UC Irvine's specific scraper. All specific scrapers inherit from the Scraper class, so for usability, the expected return value is a Scraper
object (although a UciScraper
object would also be valid and more specific).
From this Scraper
object you can return all of the university's courses, query specific courses, save the courses to a file, etc. All information returned from this Scraper
object comes from the university's course schedule website -- it will be the latest data and you need internet connection to allow the Scraper
to make requests to their website.
Scraper
scrapers.scraper
Get all courses
Scraper.scrape() -> dict
Loads all of the university's courses for the latest term as a dictionary of Course
, with their course code as the key. Each course is accessible by their key -- their course code (a String).
For any university, this .scrape()
will take around a minute to complete, since it must iterate over several pages of courses from the course schedule website to collect all courses.
Example usage:
scraper = CourseScraper().getUciScraper()
courses = scraper.scrape()
All course information is stored as a dictionary of Course
.
instructor_of_course_30000 = courses["30000"].instructor
Search specific courses
Scraper.getCourses(args: dict) -> dict
Get the latest course information (directly scraped from web) on courses fulfilling search criteria. Returns results as a dictionary of Course
, with their course code as the key
args
is a dict
in which you specify search parameters and their values. Keys and value formats are the same throughout all universities (currently only one, UC Irvine lol)
Here are the currently supported arguments:
Key | Value |
---|---|
"code" | String |
"department" | String |
"instructor" | String |
"breadth" | String |
"starttime" | String (ex: 8:00am) |
"endtime" | String (ex: 8:00am) |
"title" | String |
"units" | String or int |
You must specify one of the following parameters: code
, department
, instructor
, or breadth
Example usage:
args = {
"department': "compsci",
"units": 4
}
scraper = CourseScraper().getUciScraper()
courses = scraper.getCourses(args)
Course
scrapers.course
A Course
object holds all information you can get on a course, accessible by attributes (ex: Course.instructor
).
You can easily serialize a Course
using Course.__dict__
courses
is a list of Course
objects, ad defined in coursecake/scrapers/course.py
. From the Course
object, you can get obtain the course's data.
Here is an example of printing some course data from courses
for course in courses.values():
# This does not print out all attributes, just a select few to avoid clutter
# see more in Course.__str__(self)
print(course)
# print the status of the course; if it is open, closed, full, etc.
print(course.status)