AWS CloudFront duplicate content issue Solution

AWS Cloudfront is like proxy service, e.g. A request to Cloudfront will be served with content hosted on origin server. You chose your origin server, in the process Cloudfront will cache any files it has already served and will return the cached version for future requests.

AWS Cloudfront process:

Request -> CloudFront -> Origin Server

Origin Server -> CloudFront -> Response

 

Upon searching on internet other solutions suggest to use robot.txt, but the issue with robot.txt is that you have to make changes to your site plus it will block access to CSS and JS files too. As now Google bot are like modern browsers they need access to CSS files (to detect if your site is responsive/mobile friendly).

This solution assumes that you want to only serve static files like CSS, JS and image files.

You can use AWS Cloudfront service to cache a complete domain of your choosing, but this would create duplicate for all the content served by origin server.

So for this example we suppose that you have domain static.example.com and you are serving all its content from XXXX.cloudfront.net, this is not good SEO as it would cause duplicate content issue with search engines.

To get around the issue: once you have created your CloudFront distribution, go to “Origins” tab and add a new origin to domain say “non-existent.example.com

Once you have added a new origin to a domain that doesn’t exists any request to domain won’t get served by CloudFront.

Now go to “Behaviors” tab edit the default behavior and set origin to non existent, after this all requests to XXXX.cloudfront.net should give error(given that it is still not cached by edge locations).

Now create a new behavior with “Path Pattern” set to something like *.css for CSS files and set the origin to static.example.com, repeat this step for all the path patterns that you actually want to resolve to a successful request.

The above setup will ensure that your distribution only serve the paths patterns that you have included.

 

How to block/restrict max number of HTTP connection per IP address

I have written a simple PHP file to do that for me and I run it every 1 minute using Cron-Job.

The script needs to run as PHP-CLI (cron or command prompt) as it runs system commands and would be blocked if run as Apache script.

Here is the code.

The script whose uid is 33 is not allowed to access /tmp owned by uid 0

If SAFE MODE is on in PHP, the owner ID of the PHP file needs to be same to any folder or file it tries to access. So if you try to use sessions and if your PHP is configured to use /tmp folder for saving session data it can cause error shown in title of this blog post.

To fix it check following directive is set as follows

session.save_path = /tmp

If yes create folder

/var/lib/php5/session

change owner of the above folder to same as that of the web server, so for e.g. you would say

chown www-data:www-data /var/lib/php5/session

now change /etc/php.ini  or /etc/php5/apache2/php.ini

to use following setting

session.save_path = /var/lib/php5/session

now reload web server e.g.

service apache2 reload

or

/etc/init.d/apache reload

This should fix the problem

Mysql Replication Settings

http://www.softwareprojects.com/resources/programming/t-how-to-move-copy-a-live-mysql-database-and-what-1257.html

On Mysql Server

1. Turn binary logging on and add server id

CAUTION ! DO NOT USE “log-bin=/var/log/mysql/mysql-bin.log’

[mysqld]
log-bin=mysql-bin
server-id=1

2. Create replication account on the server

GRANT REPLICATION SLAVE ON *.* TO ‘repl’@’slave-ip-address’ IDENTIFIED BY ‘password-here’;

3. Keep the above mysql console open, start new terminal and connect to server via new terminal

cd /var/lib/mysql

now use dir, you should see all the database folders

4. Time to lock all the databases on server.

mysql> SET GLOBAL WAIT_TIMEOUT=600000; SET WAIT_TIMEOUT = 600000; FLUSH TABLES WITH READ LOCK;

now on the new console enter

root@server:/var/lib/mysql#tar -cvf /tmp/mysql-snapshot.tar ./ –exclude mysql &

enter following in mysql console, this will show the position of server in log file, keep this window open and note file and position column

SHOW MASTER STATUS;

now enter following in mysql console

UNLOCK TABLES;

5. Copy the tar file from master to slave server

scp -i permission-file.pem /tmp/mysql-snapshot.tar root@slave-server-ip:/root/

6. Extract the tar file on slave server

cd /var/lib/mysql

mv /root/mysql-snapshot.tar .

tar –extract –file=mysql-snapshot.tar

7. Change server id on slave mysql server, add following to config file /etc/my.cnf or /etc/mysql/my.cnf

[mysqld]

server-id=2

8. Start or restart Slave mysql server with

service mysql restart

9. Login to mysql console and enter

mysql>stop slave;

mysql> CHANGE MASTER TO
MASTER_HOST=’ip-of-master-server’,
MASTER_USER=’repl’,
MASTER_PASSWORD=’replication_password-from-step-2′,
MASTER_LOG_FILE=’recorded_log_file_name-from-step-4′,
MASTER_LOG_POS=recorded_log_position-from-step-4;

mysql>start slave;

If every thing is done correctly the slave should start replicating.

Once master has been set on and slave started on Slave, after restart the slave will auto start.

If something is not right on slave, login to slave mysql console

use following

stop slave;

reset slave;

now from the new bash console, reextract the tar file replacing existing file with following command

root@slave-server:/var/lib/mysql# tar –overwrite –extract –file=mysql-snapshot.tar

now repeat step 9.