Monday, May 22, 2023

Experiments with ECS: 1 - Deploying 1 microservice in ECS

 

NOTE - 

a. Should have ideally kept the RDS database in a Private Subnet

b. Could also have kept the ECS cluster in Private Subnet (Ref URL - https://repost.aws/knowledge-center/ecs-fargate-tasks-private-subnet)


Step 1 - Create a flask application that talks to a locally installed Postgres database

Ref URL - REST API Design Principles: https://www.freecodecamp.org/news/rest-api-best-practices-rest-endpoint-design-examples/


1.1 Create a Flask application


from flask import Flask

app = Flask(__name__)

@app.route(‘/')

def hello_world():

    return’ Hello World’


if __name__ == ‘__main__’:

    app.run(debug=True)


Tested @ http://127.0.0.1:5000/


1.2 Connect the application to Postgres


from flask import Flask, abort, jsonify

import psycopg2


app = Flask(__name__)


myconn = psycopg2.connect(database = "DataWarehouseX", user = "postgres", password = "xxxxx", host = "localhost", port = "5432")

mycursor = myconn.cursor()


@app.route("/api/1.0/products")

def products_view():

    try:

        global mycursor

        mycursor.execute("SELECT * FROM core.dim_product")

        db=[]

        for x in mycursor:

            db.append(x)        

        return jsonify(db)

    except IndexError:

        abort(404)

        

@app.route("/api/1.0/products/<id>")

def product_view(id):

    try:

        global mycursor

        cmd = "SELECT * FROM core.dim_product where product_id = " + "'"+id+"'"

        mycursor.execute(cmd)

        db=[]

        for x in mycursor:

            db.append(x)        

        return jsonify(db)

    except IndexError:

        abort(404)


if __name__ == '__main__':

    app.run(debug=True)


API in action -



Step 2 - Dockerize the application and run containers locally

Ref URL - https://www.freecodecamp.org/news/how-to-dockerize-a-flask-app/


app.py -

#To connect to the localhost postgresDB, switched to 'host.docker.internal'


from flask import Flask, abort, jsonify

import psycopg2

app = Flask(__name__)

myconn = psycopg2.connect(database = "DataWarehouseX", user = "postgres", password = "xxxx", host = "host.docker.internal", port = "5432")

mycursor = myconn.cursor()

@app.route("/")

def hello_world():

    return 'Hello from docker!'

@app.route("/api/1.0/products")

def products_view():

    try:

        global mycursor

        mycursor.execute("SELECT * FROM core.dim_product")

        db=[]

        for x in mycursor:

            db.append(x)        

        return jsonify(db)

    except IndexError:

        abort(404)

        

@app.route("/api/1.0/products/<id>")

def product_view(id):

    try:

        global mycursor

        cmd = "SELECT * FROM core.dim_product where product_id = " + "'"+id+"'"

        mycursor.execute(cmd)

        db=[]

        for x in mycursor:

            db.append(x)        

        return jsonify(db)

    except IndexError:

        abort(404)

if __name__ == '__main__':

    app.run(debug=True)



Dockerfile -

FROM python:3.8-slim-buster
WORKDIR /python-docker
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt
COPY . .
CMD [ "python3", "-m" , "flask", "run", "--host=0.0.0.0"]


Requirements.txt

#Had issues using the flask version I was using in Spyder. After checking the below page, decide to remove the flask version from requirements.txt
#https://stackoverflow.com/questions/71718167/importerror-cannot-import-name-escape-from-jinja2

#Had issues building docker image with psycopg2.  Hence, switched to psycopg2-binary


flask
psycopg2-binary



Step 3 - Create a RDS postgres database, load data into it, switch to using it in your rest API application

Ref URL - https://sakyasumedh.medium.com/deploy-backend-application-to-aws-ecs-with-application-load-balancer-step-by-step-guide-part-1-91935ae93c51

Had to create a Public database so that can connect to it from my LVDI and load data. Plus also connect to it from my dockerized rest API application on my local machine

Data migration -

Ref URL - https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ConnectToPostgreSQLInstance.html

Exported data from local database table into a CSV file
Imported data from CSV file into RDS database table

Connection from container to WWW works out of the box

Hence, only had to make the following change to the app.py file -

myconn = psycopg2.connect(database = "DataWarehouseX", user = "postgres", password = "xxxx", host = "rest-ecs-db.c6nvu354y8s3.us-east-1.rds.amazonaws.com", port = "5432")

Also had to fix the inbound rule for the RDS database instance to allow inbound connection from anywhere on port 5432


Step 4 - Create an ECR repo and push images to that repo

Ref URL - https://sakyasumedh.medium.com/deploy-backend-application-to-aws-ecs-with-application-load-balancer-step-by-step-guide-part-2-e81d4daf0a55

Created an ECR repo

Ran AWS configure in Visual Studio terminal where the image file exists

Then followed the push commands specified on the ECR console


Step 5 - Setup ECS cluster and deploy application to ECS

Ref URL - https://sakyasumedh.medium.com/deploy-backend-application-to-aws-ecs-with-application-load-balancer-step-by-step-guide-part-3-b8125ca27177

Had to expose port 5000




Step 6 - Add a load balancer

Ref URL - https://sakyasumedh.medium.com/setup-application-load-balancer-and-point-to-ecs-deploy-to-aws-ecs-fargate-with-load-balancer-4b5f6785e8f

Had to expose port 5000 everywhere (in ALB listened, in Target Group etc.)



Step 7 - Add a custom domain name

https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-to-elb-load-balancer.html

Had to register domain. Will automatically create a public hosted zone.

Creating a CNAME record like 

api.anandmusings.link pointing to the alb - rest-ecs-rds-alb-445658067.us-east-1.elb.amazonaws.com worked




Alias record wasn't working for some reason. Deleted a few times and created afresh. And that too started working 





Sunday, March 26, 2023

Git Tutorial

A chance conversation with Aishwarya drove a desire to finally further clarify my understanding of git. Hope you find this information useful.


Git is a version control and collaboration tool

- Version control - To allow easy rollback to a previous version of code <you can also achieve this by - creating a copy of the code folder, adding metadata showing what has changed between the 2 copies. Btw, a git repository is also just a folder where all your project files and the related metadata resides>
- Collaboration - To enable collaboration between multiple developers on a single project <you can also achieve this by say zipping the folder and sending it to someone. While merging, we will have to ask what all files have been modified and then manually copy paste the modified code to create a single golden copy of the code>

The question is how to scale the stuff listed within <> -

- Version control - Keeping multiple copies of entire folder would imply we might soon run out of disk space
- Collaboration - 1000s of developers across the globe, who might not even know each other, would need to collaborate to create a single golden copy of the code

That is where git comes in.

Reference URL - https://www.crio.do/blog/what-is-git

Reference URL - https://www.nobledesktop.com/learn/git/what-is-git

Reference URL -

https://www.quora.com/Does-git-make-a-copy-of-all-my-files-each-time-I-make-a-commit



What is the typical git workflow?




Sequence of steps to follow for various scenarios:

Scenario 1 - You want to make changes to an open source project 

Solution -

[IN GITHUB] Step 1 - Fork: Create a copy of the code from the open source code repo A to your own repo B so that you can make changes to it 

[IN GIT BASH ] Step 2 - cd to the laptop location you want to place the code in

[IN GIT BASH ] Step 3 - Clone: Download code from repo B to your local machine

git clone <repo_url>

[In GIT BASH - ONE TIME] Step 4 - Config: Config git

git config --global user.name <your_user_name>
git config --global user.email <your_email>

[In GIT BASH] Step 5 - Code changes - Make code changes 

[In GIT BASH - OPTIONAL] Step 6 - Status - Check which files have been newly created or modified in the working directory 

git status

[In GIT BASH] Step 7 - Add - Start tracking files

git add .

Can also add files individually

git add <file1> <file2> 

Use space to separate file names

NOTE - This tells Git to take a snapshot of the contents of all files under the current directory and add them to the staging area / index (think of it as the backstage of a theatre). Why can't we commit the files directly? Let us say you are working on two files, but only one of them is ready to commit. You don't want to be forced to commit both the files, just the one that is ready

[In GIT BASH - OPTIONAL] Step 8 - Status - Check if the newly created / modified files from step 8 are getting tracked now

git status

[In GIT BASH] Step 9 - Commit - Commit changes to local repo

git commit -m 'Commit message summarising the changes made'

[In GIT BASH] Step 10 - Push - Push changes to remote repo

git push -u origin main

NOTE - You will be promoted to enter your GitHub credentials in the browser 

NOTE - Earlier main was referred to as master but later that was changed to main as master is a bad word just like slave

[IN GITHUB] Step 11 - Pull Request - Create a Pull Request from your repo B back to repo A to merge changes there. 

Sometimes when we merge two branches (from two different repos, as is scenario here or the same repo, as in the next scenario) and two developers have worked on the same part of the file, you will get a merge conflict. Git will show you both sets of changes and let you decide which one do you want to keep.

---

Scenario 2 - You want to make changes to your own team's codebase (FIRST TIME)

[IN GIT BASH ] Step 1 - cd to the laptop location you want to place the code in

[IN GIT BASH ] Step 2 - Clone: Download code from repo to your local machine

git clone <repo_url>

[In GIT BASH - ONE TIME] Step 3 - Config: Config git

git config -Global user.name <your_user_name>
git config -Global user.email <your_password>

NEW! [In GIT BASH] Step 4 - Branch - Create and move to a new branch

git branch branch_name
git checkout branch_name

NOTE - You can also do this in only one command using 

git checkout -b branch_name

[In GIT BASH] Step 5 - Code changes - Make code changes 

[In GIT BASH - OPTIONAL] Step 6 - Status - Check which files have been newly created or modified in the working directory 

git status

[In GIT BASH] Step 7 - Add - Start tracking files

git add .

[In GIT BASH - OPTIONAL] Step 8 - Status - Check if the newly created / modified files from step 8 are getting tracked now

git status

[In GIT BASH] Step 9 - Commit - Commit changes to local repo

git commit -m 'Commit message summarising the changes made'

MODIFIED! [In GIT BASH] Step 10 - Push - Push changes to remote repo

git push -u origin branch_name

MODIFIED! [IN GITHUB] Step 11 - Pull Request - Create a Pull Request from your branch to 'develop' branch to merge changes there

NOTE - Why to the develop branch? We usually setup a CICD pipeline to continually deploy changes from dev branch to the dev environment. When the changes have been tested in dev, we can raise a Pull Request from dev branch to main branch to merge changes to main and use a CICD pipeline to automatically deploy them to prod environment.

---

Scenario 3 - You want to make changes to your own team's codebase (ONGOING)

[IN GIT BASH ] Step 1 - cd to the laptop location you had earlier cloned the repo to

NEW! [IN GIT BASH ] Step 2 - Pull: Do a git pull to download the latest code from the remote repo

git pull

[In GIT BASH - ONE TIME] Step 3 - Config: Config git

git config -Global user.name <your_user_name>
git config -Global user.email <your_password>

[In GIT BASH] Step 4 - Branch - Create and move to a new branch

git branch branch_name
git checkout branch_name

NOTE - You can also do this in only one command using 

git checkout -b branch_name

[In GIT BASH] Step 5 - Code changes - Make code changes 

[In GIT BASH - OPTIONAL] Step 6 - Status - Check which files have been newly created or modified in the working directory 

git status

[In GIT BASH] Step 7 - Add - Start tracking files

git add .

[In GIT BASH - OPTIONAL] Step 8 - Status - Check if the newly created / modified files from step 8 are getting tracked now

git status

[In GIT BASH] Step 9 - Commit - Commit changes to local repo

git commit -m 'Commit message summarising the changes made'

[In GIT BASH] Step 10 - Push - Push changes to remote repo

git push -u origin branch_name

[IN GITHUB] Step 11 - Pull Request - Create a Pull Request from your branch to 'develop' branch to merge changes there

---

Scenario 4 - You want to push changes to a new GitHub repo

NEW! [IN GITHUB] Step 1 - Create repo: Ceate a new repo in GitHub 

[IN GIT BASH ] Step 2 - cd to the laptop folder your code exists 

NEW! [In GIT BASH] Step 3 - Init: Initialize a new git repo

git init

NOTE - This will create a new hidden folder on the directory called .git

[In GIT BASH - ONE TIME] Step 4 - Config: Config git

git config -Global user.name <your_user_name>
git config -Global user.email <your_password>

[In GIT BASH - OPTIONAL] Step 5 - Status - Check which files have been newly created or modified in the working directory 

git status

[In GIT BASH] Step 6 - Add - Start tracking files

git add .

Can also add files individually

git add <file1> <file2> 

Use space to separate file names

[In GIT BASH - OPTIONAL] Step 7 - Status - Check if the newly created / modified files from step 8 are getting tracked now

git status

[In GIT BASH] Step 8 - Commit - Commit changes to local repo

git commit -m 'Commit message summarising the changes made'

NEW! [In GIT BASH - ONE TIME] Step 9 - Set Origin - Set remote repo as origin

git remote add origin <remote repo url>

[In GIT BASH] Step 10 - Push - Push changes to remote repo

git push -u origin main

NOTE - You will be promoted to enter your GitHub credentials in the browser 


This attachment summarizes the difference in steps one needs to follow for each scenario

Thursday, January 26, 2017

Watching Moneyball


In keeping up with my quest to explore more and more of what data is capable of, I really felt like watching the movie Moneyball today.

I found the streaming URL for the same on Putlockerr. However, for some reason, my Internet connection started acting up. The video kept buffering every few minutes, rendering the entire movie watching experience jerky & inconsistent. So, I decided I must first download the movie so that I could then watch it in peace. Here's how I accomplished that objective -

1) Installed the extension 'Flash Video Downloader' in Google Chrome. It was able to detect the media file on the Putlockerr page and started to download the same.

2) Here's when I ran into my other problem - The download was too slow. Installed 'Download Accelerator Plus', fed it the movie download URL from the extension & lo and behold, was soon able to download the entire movie in a fraction of time it would have taken otherwise.

It is too late to watch the movie by the time I was done. Had to get up early and take my guys for a walk. However, now that I had the movie safely stored on my laptop, I would be able to watch it while I have my breakfast tomorrow :)


Saturday, January 07, 2017

Hadoop - Data Ingestion Using Flume & Sqoop

Had some fun today experimenting with Flume and Sqoop to ingest data into HDFS. Here's a brief summary of my approach -



Flume is an optional service (unlike HDFS & YARN) available in Hadoop to ingest unstructured data (logs, social media data) into HDFS. 

I wanted to leverage Flume's spooldir protocol support to copy data from a staging area within my local file system to a directory within Hadoop. 

My sandbox - A CentOS 6.5 based VM provided by Edureka that came with Hadoop, Flume and Sqoop pre-installed. 

Methodology -

Step 1. Create a sample text file in the /home directory that would be copied to HDFS. 
Useful URLs -
http://www.thegeekstuff.com/2010/09/linux-file-system-structure/?utm_source=tuicool
http://www.howtogeek.com/199687/how-to-quickly-create-a-text-file-using-the-command-line-in-linux/

Step 2. Create a directory within HDFS to store the data ingested by Flume -
hadoop fs -mkdir /data/flume

Step 3. Obtain the standard flume configuration file format - 
https://flume.apache.org/FlumeUserGuide.html

Step 4. Use the above template to create a sample.conf file -
https://drive.google.com/open?id=0B1xeTI1i_SxtTndqTUZfNEhMUm8

Step 5. Navigate to the Flume directory -
cd /usr/lib/flume-ng/apache-flume-1.4.0-bin/bin

Step 6. Execute the Flume Agent -
./flume-ng agent --conf conf --conf-file /usr/lib/flume-ng/apache-flume-1.4.0-bin/conf/sample.conf --name a1 -Dflume.root.logger=INFO,console

Step 7. After the logs indicate that the file has been copied to HDFS, do a Ctrl + C to cancel the Flume process.

My text file was renamed to 'sample_flume.txt.COMPLETED'
I could also see the ingested data in HDFS using -
hadoop fs -ls /data/flume/FlumeData.148730432550





Sqoop is also an optional service within Hadoop. It is used to transfer structured data between HDFS and RDBMS. Notably, Sqoop supports both to & fro data transfer.

My plan was to transfer a table from a MySQL database on my Sandbox to a HIVE table sitting on HDFS.

Methodology -

Step 1. Install MySQL on my host & create a sample table.
Useful URLs -
https://support.rackspace.com/how-to/installing-mysql-server-on-centos/
https://www.linode.com/docs/databases/mysql/how-to-install-mysql-on-centos-6

SQL Commands -
show databases;
use test;
show tables;
create table family(Name VARCHAR(100), Age INT);
insert into family values('Anand',35);
insert into family values('Kalpana',40);
select * from family;
quit;

Step 2. Navigate to the Sqoop binary directory -
cd /usr/lib/sqoop-1.4.4/bin

Step 3. Run the command - 
sqoop import --connect jdbc:mysql://localhost:3306/test --username=root --password= --table=family --hive-import --hive-table=family_demo --target-dir=/data/sqoop -m 1

Step 4. Check the results by reviewing the /data/sqoop directory and through HIVE -
hadoop fs -ls /data/sqoop

HIVE QUERIES-
show tables;
select * from family_demo;
quit;

Wednesday, December 21, 2016

Oracle - Data Transfer Between Databases



Recently, I had to import data from one Oracle database into another. I also needed to schedule a job to do this on a recurring basis. Here are the steps I undertook to accomplish this -

Step 1 - Setup a Database link to the Source Database in the Target Database schema


Since both the Source & Target database were Remote databases, I ran into some issues with defining the Service Names.

I did so using Oracle SQL Developer

Useful Links -

https://www.youtube.com/watch?v=buaSuEMi4lw

(View Parts 1 & 2)

Step 3 - Create a Database Schedule & an associated Database Job

Oracle DBMS_Scheduler package allows segregation between Programs, Schedules & Jobs. Since, I already had the relevant SQL code compiled as a Stored Procedure, I just had to create a Database Schedule & then an associated Database Job to run that Stored Procedure as per the defined Schedule

Useful Links -

http://allthingsoracle.com/introduction-to-scheduled-jobs/

https://www.youtube.com/watch?v=detNIFuhOGo


Sunday, December 04, 2011

Switching to Old Blogger Templates

The instructions that worked for me -

http://www.spiceupyourblog.com/2011/01/how-to-select-old-layout-blogger.html

I did not quite like the new Blogger templates. They seemed to take the focus away from the blog's actual content.


Monday, January 26, 2009

DVD to Xvid


This weekend, after a long hiatus, I resumed work on some of the technical experiments I had since long meant to undertake. Ironically, fever and cold, helped me take that much needed break from work and exercise my technical muscles a little bit. 

The first major experiment I carried out was ripping a DVD and converting it into the Xvid format. A DVD of the movie "Sorry Bhai" (and a really nice movie, I must add) served as the guinea pig for the experiment.

How I did it?

Step 1 - Ripped the DVD using DVD Decrypter.

DVD Decrypter is one of the best free DVD ripping tools available out there. It is "end of life" in the sense that no new versions of it are under development, but version 3.5.4, that I used, did the job fairly well. It is especially reknowned for its capability to copy protected DVDs and removing the region protection feature of the DVD while ripping.



How to use DVD Decrypter?

Download DVD Decrypter from http://www.dvddecrypter.org.uk/. It is a small 878 KB download (proving the adage that great things do come in small packages).

Install it & run it.

Load the DVD you want ripped in the DVD drive. DVD Decrypter will automatically detect it.

Switch to 'File' mode by clicking on "Mode" > "File".

By default, it will select all the relevant files

Alternatively, you can manually select all the files listed over here.

You can know more about exactly which files to rip if you read about the file structure of DVD Videos @ the websites like -

http://stream.uen.org/medsol/dvd/pages/dvd_format_filestructure.html

http://club.cdfreaks.com/f72/tutorial-dvd-video-file-structure-77646/

http://www.dvd-replica.com/DVD/data-2.php

Once you have selected the relevant files, you can specify a destination folder (make sure that there is ample free space available wherever you are choosing to save the ripped files).

The final action you need to undertake is to Click on the huge DVD to HDD icon to actually initiate the ripping procedure. The entire rip does not take more than 5-10 mins (at least that is how much it took for the DVD that I had ripped).

Step 2 - Covert the ripped file to Xvid (or DivX) format.

For this, you use a took called AutoGK. AutoGK (as the name suggests) completely automates the MPEG Layer 2 to MPEG Layer 4 conversion process.



How to use AutoGK?

In the "Input File" section, just select the IFO file for the first video tile set (would normally be the VTS carrying the movie) - VTS_01_0.IFO. You can also select an individual VOB file from the title set to encode.In the "Output File" section, specify the name and destination of the output AVI.

You don't need to do anything in particular with the "Audio Track" and "Subtitle Track" sections. Leaving that to defaults will do.

Just specify the output size - selecting 700 MB (the average size of a DivX or Xvid file) might be a good idea.

Once, you have done all this, click "Add Job" and then click "Start".

Please be aware that DVD to DivX (/ Xvid) conversion is a very time consuming process (taking two passes in all). So, much so, that makers of AutoGK have included a "Shutdown when done" option in the AutoGK interface.

That is it, after a couple of hours, while you continue to use your PC for other purposes (except that it slows down the conversion process a little bit), you have a pretty handy 700 MB high quality movie file in your hand which you can be easily shared as a bittorrent.Some relevant websites that talk about the conversion process in greater detail are -

http://www.doom9.org/index.html?/autogk.htm

http://www.ehow.com/how_2309420_use-auto-gordian-knot.html

NOTE - Needless, to say both DVD Decrypter and AutoGK have several configuration options you can fiddle with to improve the quality of the final file. However, for a bare bones rip up and conversion, the instructions given above should suffice.

So, happy ripping to all of you :) Feel free to let me know how does it work out.