Lambda: Move S3 file based on content

We have a process that saves a file to an S3 bucket. We needed a lambda to read the file, parse part of the content, and move the file to the appropriate folder in the bucket – So we set up a lambda to run whenever a file is created in the base folder of the bucket, read the file, and move it to the appropriate place.

AWS Certified Solutions Architect – Associate

I’ve been doing a Udemy course as a preparation for the AWS Certified Solutions Architect – Associate. These are my summary notes

AWS Certified Solutions Architect – Associate

Exam

  • 130 minutes
  • 60 questions
  • Results are between 100 – 1000, pass: 720
  • Scenario based qestions

IAM

  • Users
  • Groups
  • Roles
  • Policis

Users are part of Groups Resources have Roles : i.e, for an instance to connect to S3, it needs to have a role All the User groups and Roles get their permissions are through Policies, which are defined by json:

# God mode policy
{
    "Version":"2019-01-01",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "*",
            "Resource": "*"
        }
    ]
}

(Creating policies is not part of the exam)

General

  • IAM is cross-regional
  • "root account" is the account created with the first setup of the AWS account, and has complete Admin access.
  • New users have no permissions until assigned
  • New users are assinged Access Key Id and Secret Access Keys when created, for the api access.

S3

  • Key – Value Object based, with metadata and versioning
  • Has access control lists
  • Max 5TB file size
  • Buckets are universal namespace (https://s3-{region}.amazonaws.com/{bucket})

Consistency model:

  • Read After Write consistency – object will be availabe for readdirectly after being written
  • Eventual consistency for overwrite PUTS and DELETES

Storage Tiers/Classes

  • S3 Standatrd – 99.99% availability, 99.999999999% durability, cross-devices, cross-facilities redundancy, designed to sustain loss of 2 facilities at the same time.
  • S3 – IA (infrequently access) – for data accessed less frequently. Lower storage fee, but has a retrieval fee. S3 One Zone – IA : the same as IA, only in 1 AZ. (cheaper)
  • Glacier: Very cheap, archival only. Standard retrieval time takes 3 – 5 hours.

Cross Region Replication (CRR)

  • Requires versioning enaled on the source bucket

CloudFront

  • Edge Location – the location the content will be cached: Per AWS Region (They are not read only, you can write to them too, and they will replicate to the origin and from there to others)
  • Clearing cache cost money 🙂
  • Origin – The original file location: S3 bucket, EC2 instance, ELB, or Route53
  • Distribution – all the locations of the Edges you defined
  • Can distribute dynamic, static, streaming and interactive content (Web Distribution: most common, for websites; RTMP – media streaming)

EC2

Placment groups

Two types:

  1. Cluster placment group – A group of instances within a single AZ that need low latency / high throughput (i.e cassandra cluster). Only available for specific types.
  2. Spread placment group – A group of instances that need to be place seperatly from each other
  • Placment group name myst be unique within aws account
  • Only available for certain instance types
  • Recommended to use homogenous instances within placment group
  • You can’t merge placment groups
  • You can’t move an exisitng instance to a placment group, only create it into it

EFS

  • Supports NFSv4
  • Only pay for used storage
  • Scales up to petabytes
  • Support thousands of concurrent NFS connections
  • Data is stored across multiple AZ within region

Route 53

DNS overview

  • NS – Name Server record. Meaning, if I go to helloRoute53gurus.com, and I’m the first one to try it in my ISP, then the ISP server will ask the .com if it has NS recored for helloRoute53gurus. The .com will have a record that maps it to ns.awsdns.com. So it’ll go to ns.awsdns.com , which will direct it to Route53..\
  • A – short for Address – that’s the most basic record, and it’s the IP for the url
  • TTL – time to live – how long to keep in cache
  • CNAME – resolve one domain to another (can’t be used for ‘naked’ domain names, e.g: ‘www.google.com’ )
  • Alias – unique to Route53, the may resource records to Elastic Load Balancer, CloudFron, or S3 bucket that are configured as websites. They work like CNAME (www.example.com -> elb1234.elb.amazonaws.com)
  • MX record – email records
  • PTR Records – reverse lookups

ELB do not have predefined IPv4 addresses, you resolve them using a dns name. So basically, if you have the domain "example.com" and you want to direct it’s traffic to an ELB, you need to use an Alias (not a cname, because it’s a naked domain name!, and not an A record because it has no IP)

Routing Policies

  • Simple Routing – 1 record with multiple ips addresses, randomly returned. No health checks
  • Weighted Routing – 1 record with N% goes to one rcecord, and M% to another and so forth
  • Latency Based Routing – Route 53 will send to the region with the lowest latency
  • Failover Routing – Health check based Primary/Secondary routing: if the primary instance fails (health check = false), directs to the secondary
  • Geolocation Routing – config which geo location goes to which instance
  • Multivalue Answer Routing – Several records, each with ip addresses, and health check for each resource. The ips will return randomlly, so it’s good for disparsing traffic to different resources.

VPC

  • NAT Gateways – scaled up to 10G, no need to patch/add security groups/assign ip (automatic), they do need to be updates in the routing table (so they can go out via igw)
  • Network ACL –
    • It’s like a SG, in the subnet level.
    • Each subnet is associated with one, but default it’s blocking all in/out bound traffic. you can associate multiple subnets to the same ACL, but only 1 ACL per subnet.
    • The traffic rules are evaluated from the lowest value and up.
    • Unlike SG, opening port 80 for incoming will not allow outbound response on port 80. If you want to communicate on port 80, you’d have to define rule both for inbound and outbound. (Otherwise, it’ll go in and not out)
    • You can block IP addresses using ACL, you can’t with SG
  • ALB – you need at least 2 public subnets for an Application Load Balancer

Application Services

SQS

  • Distributed Pull based Messaging queue
  • Up to 256 Kb messages
  • Default retention: 4 days, max 14 days
  • Default promisese "at-least-once", "FIFO" promises exactly once with ordering
  • Can poll with timeout (like kafka)
  • Visibility – once message is consumed, it’s marked as "invisible" for 30 seconds (default, max is 12 hours), and if it’s not marked as "read" within that time frame, it returns to be visible and re-distributed to another consumer.

SWF – Simple Workflow Service

  • Kind on amazon ETL system, with Workers (who process jobs) and Deciders (who control the flow of jobs). The system enables dispatching of jobs to multiple workers (which makes it easily scalable), tracking the jobs status, and so forth.
  • SWF keeps track of all the tasks and events in an application (in SQS you’d have to do it manually)
  • Unlike SQS, In SWF a task is assigned only once and never duplicated (What happens if the job fails? IDK).
  • SWF enables you to incorportae human interaction – like, if someone needs to approve received messages, for example

SNS – Simple Notifications Services

Delivers notification too:

  • Push notifications

  • SMS

  • Email

  • SQS queue

  • Any http endpoint

  • Lamda functions

  • Messages are hosted in multiple AZ for redundancy

Messages are agregated by Topics, and recipients can dynamically subscribe to Topics.

Elastic Transcoder

  • Convert video files between formats – like formatiing video files to different formats for portable devices

API Gateway

Basically a front API for your lamda/internal APIs, with amazon capabilities:

  • API Caching – caching responses to an request api with TTL
  • Throttling requests to prevent attacks

Kinesis

3 types:

  • Streams – Kafka (Retention : up to 7 days) – Shards = partitions (?)
  • Firehose – Fully automated, no consumers, No retention, No shards – Can be written to s3 / elastic
  • Analytics – run SQL queries on the streams/firehose streams, and write the result to s3 /elsastic

AWS Re:Invent 2018 – My Top Sessions

I’m planning to upload a different post on each one of the sessions I liked at the Re:Invent 2018, but for now, just to have everything at one place, here is the short list:

SVR322 – From Monolith to Modern Apps: Best Practices

We are a lean team consisting of developers, lead architects, business analysts, and a project manager. To scale our applications and optimize costs, we need to reduce the amount of undifferentiated heavy lifting (e.g., patching, server management) from our projects. We have identified AWS serverless services that we will use. However, we need approval from a security and cost perspective. We need to build a business case to justify this paradigm shift for our entire technology organization. In this session, we learn to migrate existing applications and build a strategy and financial model to lay the foundation to build everything in a truly serverless way on AWS.

SlideShare

ARC337 – Closing Loops and Opening Minds: How to Take Control of Systems, Big and Small

Whether it’s distributing configurations and customer settings, launching instances, or responding to surges in load, having a great control plane is key to the success of any system or service. Come hear about the techniques we use to build stable and scalable control planes at Amazon. We dive deep into the designs that power the most reliable systems at AWS. We share hard-earned operational lessons and explain academic control theory in easy-to-apply patterns and principles that are immediately useful in your own designs.

Slideshare

ARC403 – Resiliency Testing: Verify That Your System Is as Reliable as You Think”

In this workshop, we illustrate how to set up your own resiliency testing. We set up a simple three-tier architecture and explore the failure modes with Bash and Python scripts. To participate, you need an account that can run AWS CloudFormation, AWS Step Functions, AWS Lambda, Application Load Balancers, Amazon EC2, Amazon RDS (MySQL), and the AWS Database Migration Service, and Route53.

(Sorry, couldn’t find youtube / slides 😦 )

ARC335 – Failing Successfully in the Cloud: AWS Approach to Resilient Design

AWS global infrastructure provides the tools customers need to design resilient and reliable services. In this session, we discuss how to get the most out of these tools.

(Sorry, couldn’t find youtube / slides 😦 )

SRV305 – Inside AWS: Technology Choices for Modern Applications

AWS offers a wide range of cloud computing services and technologies, but we rarely give opinions about which services and technologies customers should choose. When it comes to building our own services, our engineering groups have strong opinions, and they express them in the technologies they pick. Join Tim Bray, senior principal engineer, to hear about the high-level choices that developers at AWS and our customers have to make. Here are a few: Are microservices always the best choice? Serverless, containers, or serverless containers? Is relational over? Is Java over? The talk is technical and based on our experience in building AWS services and working with customers on their cloud-native apps.

Couldn’t find slides, but someone blogged about it here

AWS Re:Invent 2018 – Recap

On November 2018 I was on my first AWS Re:Invent convention in Vegas. This was an amazing experience, which I highly recommend to anyone working with AWS (don’t we all?).

The sheer size of the convention (50K people!), the volume of sessions and products and above all, the amazing diversity of occupations and fields people came from was mind blowing.

Following is a short recap of the lessons I learned my first time in AWS Re:Invent – from registration to what not to miss (and what you can feel free to miss) in the event, and how to survive it.

Question 1: How much does it costs?

The registration fee is $1,800, and staying in a nice hotel for 6 nights was ~$800. In addition, you have 6 days of not getting any work done, plus flights .

Question 2: Is it worth it?

For me, as a DevOps/Developer with our entire fleet hosted on AWS – Totally. But I must say that the most valuable things I learned at Re:Invent was not so much the AWS services tutorials/announcements, but the sessions where professionals from around the world shared their experiences with moving/building/expanding to AWS infrastructure. If you want to convince your supervisor why it’s good, AWS even have a ready justification letter 🙂

Question 3: OK, I’m in. Should I register in advance to sessions? How? Where?

So, registration to sessions is really the weak side of the Re:Invent convention. The website looks like a relic from the 1990’s, searching is very hard and unintuitive, the calendar option only appeared a week after the registration was opened – seriously, terrible.

After you got over you shock that this is the entry point to what is the largest developers convention I know, few tips:

  • Most big sessions are held more than once, so can find them on other days/venues
  • A lot of sessions are broadcasted live in different venues (and even in the same venue) – So if you couldn’t find a seat, you still have a chance to see it.
  • The system won’t let you schedule 2 sessions less the 30 minutes apart if they are in different venues – take that into consideration.
  • Registration to sessions ends fast. I had all my desired sessions opened in different tabs, and the moment the registration opened I clicked “Register” on each one of them – and still didn’t get a seat in some.
  • New sessions and additional screenings are added all the time during the convention, and people replace and free their seats. Keep your “favourite” lists and check daily if something interesting has opened up.
  • Sessions level – anything lower than 300 is very basic. Only go if it’s something totally new for you / you’re new to AWS
  • Session types:
    • Workshops – vary significantly in their value: Some of them a really good, but in most of them you’re just following a github-hosted tutorial and have 2 AWS personal going around and assisting you with technical issues. I must say most of the workshops weren’t very valuable to me
    • Chalk-talk – Most chalk talks I’ve been at had 2 very experienced engineers, sharing their experiences on various topics. These were some of my best sessions.

Question 4: What to see?

Re:Invent really has a lot of extracurricular activities (Bar crawl, races, 4k runs, the expo, and so forth). I admit I haven’t attended to most of them – I arrived late Sunday night and had a 10 hours time diff to get over, so most nights I was in a zombie state, and I’m not very good networker. If you are (and you’re not jet-legged to death) – go!

The Expo: I guess you’ve heard all the urban legends of the wonderful land of expo, where swag is abundant and freely given. Well, it’s true, a lot of things are freely given, but you will have to stand in line for hours for some tech-labeled-socks. For the really valuable things, you’d have to compete with people, register to listen to some sales pitch you’re not interested at, and generally waste your time. My recommendation – skip it. If you have a free hour at the Venetian, go have a look – but trust me, no need to plan your schedule around it.

The Quad, however, is waaaay more interesting. You get a chance to play and build robotic legos and other things!

re:Play Party: I’ve only been to one, but I must say it wasn’t that impressive. I mean, go – it’s already paid for in your ticket, but let’s say I didn’t have any remorse for leaving early….

Question 5: What to wear? Where to eat? How to get around?

Unless you’re presenting something – snickers, jeans, and a t-shirt. Get a light jacket for the over-conditioned lecture halls and the rides between places, but most of the time the temperature is really office-like. (You’ll spend most of your time indoors anyway)

The food halls are enormous, but the food is really good – they always have gluten free choices, btw! – and food and drinks are abundant , to the point where you get snack when getting off the shuttles. I haven’t been to any of the breakfasts, only lunches, but I guess the standard is the same. Basically, you’ll only have to eat dinner on your own expense.

Getting around the venues is extremely easy with the shuttles. Before I arrived I heard from a lot of people that in previous years the shuttles were really bad, and that I should base my mobility on Uber – but at least this year I can attest that the shuttles were rapid, fast and convenient.

General Tips

  • DO NOT buy coffee at Starbucks. They have (good) coffee/tea/soda stands everywhere around the lecture halls. Save your money and time (the queues are infinite)
  • Constantly fill you water bottle (They have refill stands everywhere)
  • Carry a lip balm on your person. Vegas is dry as hell.
  • There are electricity outlets literally everywhere, and the wifi was surpassingly good.

Monitoring CloudWatch statistics using Grafana, InfluxDB and Telegraf

We’ve started checking out monitoring solutions for our AWS-based infrastructure, and we want it to be not-that-expensive, monitor infrastructure (cpu, I/O, network…) and Application statistics

We’ve looked into several options, and we’re currently narrowing it down to Grafana-InfluxDB-Telegraph.

The idea is as following: Use Telegraf to pull CloudWatch statistics from amazon, save them into the InfluxDB, and use Grafana to present these statistics and manage alerts and notifications.

(Why not just use the Grafana CloudWatch plugin? Because it doesn’t supoort notification, sadly )

Set up the environment

To test everything, we’ve set up a docker env:

Create a network

docker network create monitoring

The Grafana docker

docker run -d -p 3000:3000 --name grafana --net=monitoring -v $PWD:/var/lib/grafana -e "GF_SECURITY_ADMIN_PASSWORD=secret" grafana/grafana

The Influx docker

docker run -p 8086:8086 -d --name influxdb --net=monitoring -v $PWD:/var/lib/influxdb influxdb

Important! add 127.0.0.1 influxdb to your hosts file (see Sanities for durther explanation)

The Kapacitor docker

We’re running it with — net=container:influxdb docker run -p 9092:9092 -d --name=kapacitor -h kapacitor --net=monitoring -e KAPACITOR_INFLUXDB_0_URLS_0=http://influxdb:8086 $PWD/kapacitor.conf:/etc/kapacitor/kapacitor.conf:ro kapacitor

The telegraph docker

First we need to generate a config file for our needs, so: docker run --rm telegraf --input-filter cloudwatch --output-filter influxdb config > telegraf.conf

And then we need to fix the region, credentials, and so on (not a lot) Then run the docker:

docker run -d --name=telegraf --net=monitoring -v $PWD/telegraf-aws-influx.conf:/etc/telegraf/telegraf.conf:ro telegraf

and

docker logs -f telegraf

Let’s monitor!

So we have all the services up — grafana, influxDB and telegraf. By now, telegraf should be pulling data from aws cloudwatch, and storing them inside influxDB. So now we need to hook up grapfana into to that data stream!

Create a new DataSource from your InfluxDb, with db = telegraf (you’ll have to input it in the DataSource page) and call influx_cloudwatch

Create a new Dashboard with your influx_cloudwatch data source, and create a new Graph.

Entities probelm

As you might have notice, we now have all these matrics, but we have a problem: We want to monitor our performance by application, and most of the data is available to us with only instanceIds (and these are not fixed, because we use ElasticBeanstalk).

Some of the data measurements, like AWS/ECS is available with clusterName tag which is a bit similar to our application name (“awseb-${appName}-rAnd0mNum”), but the AWS/EC2 instances only come with AutoscalingGroup tag, which is by not very indicative to our application names(awseb-r-x5r778aw-stack-AWSEBAutoScalingGroup-123Z3RRAFF86). So, we need to find a normal way to add the application name to both the EC2 data and the ECS data, so we could build something that makes sense.

So we’re using Kapacitor:

To add script: kapacitor define ${scriptName} -type stream -tick ${scriptFileName.tick} -dbrp kapacitor_example.autogen

To write to the db: https://docs.influxdata.com/kapacitor/v1.3//nodes/influx_d_b_out_node/

Let’s Alert!

In order to add alerts to our graphs, we first need to add Alert channels. In Grafana, go to Alerting, and add an alerting channel. The easiest one imho is the Telegram alert channel.

Just install telegram (on your local machine, and on your mobile phone), and then go to the BotFather. Create a bot according to the instructions, and after you create it, run /token. It will give you the Bot Api Token. The next thing you need is you chatId. To get that, just go to get_id_bot which will give you your Chat Id.

That’s all you need. Now you can go to one of the graphs, hit ‘Alerts’, and there ‘Notifications’

Sanities

If things don’t seem to work:

First, log in to your Grafana docker and run curl -G 'http://influxdb:8086/query?pretty=true' --data-urlencode "q=SHOW DATABASES" If the query passes, you can access influx db from the grafana docker Now run the same thing from the telegraf docke.

Also, the reason you should add “influxdb” to your hosts file is this: When we use all the dockers in the same network it means they can access each other seamlessly. However, when we open Grafana using our local browser, and try to add influxdb as a Data Source, it is all done on the client — which is the host(!) of the dockers. So, it’s doesn’t know what is “influxdb”. That’s why we add it to the hosts file.