Original Source Here
The organization of the project structure can be set up in the following way:
.github/workflows/ — github workflows definitions for the CI.
data/ — Contains all of the I/O data. Usually, split further into sub_folders for clarity: raw and processed. Data Files are NOT version controlled (VC) by Git, but we use Data Version Control (DVC) system for every file on every branch separately.
logs/ — all logs from the code altogether with model training losses information in a form of a CSV file (i.e. using TensorFlow CSV callbacks). This folder is not VC by Git, but we can optionally use DVC here.
mlruns/ — used by MLFlow package with automatic logging of all TenforFlow related training with all the artifacts and configs. This folder is not VC by Git -> It should always be kept up to date, especially for trainings on the remote servers -> to see the full project development history across all branches.
models/ — storing models from all branches which typically contain branch name information in their name string to easily distinguish (I will typically keep only 1 model per branch). NOT VC by Git nor DVC.
sql/ — keeping all queries/code to reproduce easily the raw_input_data.csv for the project/branch (VC by Git)
tools/ —the source code for all helper functions/classes used in all steps of the project (train, analysis, evaluation, packaging, monitoring, etc). This is the general tool-set developed for the project. We need to keep everything that can be generalized and reused here instead of some pure definitions separately in notebooks.
imgs/ — all the images related to the current branch (model architecture, training losses, etc)(VC by Git)
README.md — information about the current model, approach and changes related to the current branch (VC by Git).
config.py — configuration of the project for the current branch (VC by Git).
train.ipynb — base minimalist training pipeline (VC by Git| or NOT — to be decided)
eval.ipynb — base minimalist evaluation pipeline (VC by Git| or NOT — to be decided)
requirements.txt — python packages requirements for the project (mainly for the CI)
.env — safekeeping of all secrets, tokens — NOT VC by Git!!! Keep this one safe!
.gitignore — base Git setup on what not to follow.
.pre-commit-config.yaml — pre-commit hooks set up for the project (typically isort, black, etc) -> allows maintaining coherence in the python style and format the code automatically.
You can find the link to this project: HERE
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot